sample_frac function changes the cell order #58

Open aodainic7 opened 1 year ago

aodainic7 commented 1 year ago

Hey, when I try subsetting a large Seurat object to reduce the computing time, the sample_frac() function changes the cell order, so that the Seurat functions do not work anymore. To repeat the error try the code:

pbmc_small = SeuratObject::pbmc_small

pbmc_small_subset <- pbmc_small |> sample_frac(0.9)

pbmc_small_subset <- RunPCA(pbmc_small_subset, = 'pca', assay = "RNA")

The error I'm getting is : Error in validObject(object = x) : invalid class “Seurat” object: 1: all cells in assays must be in the same order as the Seurat object invalid class “Seurat” object: 2: all cells in reductions must be in the same order as the Seurat object invalid class “Seurat” object: 3: all cells in graphs must be in the same order as the Seurat object (offending: RNA_snn) invalid class “Seurat” object: 4: 'active.idents' must be named with cell names


sessionInfo() R version 4.2.0 (2022-04-22) Platform: x86_64-pc-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core)


william-hutchison commented 1 year ago


I believe changing the row order is the intended behaviour of sample_fraq() in dplyr, so it makes sense for the function to change cell order in tidyseurat.

You can see how sample_fraq() changes the row order in this example:

stemangiola commented 1 year ago

Yes it randomizes cells, but it should not break the object, I think cells are randomised but not for all assays or slots somehow.

You can see the seurat function here that does that for reference

subset <- SubsetData(object, max.cells.per.ident = n.cells, random.seed = NULL)


possibly related