satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.24k stars 901 forks source link

Shuffling Seurat Object for training / testing set #7425

Closed ananyapavuluri closed 1 year ago

ananyapavuluri commented 1 year ago

I have a Large Seurat object of single-cell ATAC-seq assay. I need to find a way to shuffle the elements of the Seurat object such that the profiles of each single cell remain intact (i.e., it should not just shuffle the count matrix, it should shuffle the whole object including metadata values accordingly as well) and then split the shuffled Seurat object into testing and training sets.

This is how my data looks at the moment:

whole_atac <- readRDS(... name of my RDS file ... )
whole_atac <- RunTFIDF(whole_atac)
#filtering out low resolution cells, keeping only Neurons of Female mice
data_female <- subset(
  x = whole_atac,
  subset = peak_region_fragments > low_prf &
    peak_region_fragments < hig_prf &
    pct_reads_in_peaks > low_prp &
    blacklist_fraction < high_blf &
    nucleosome_signal < hig_ns &
    TSS.enrichment > low_ts &
    predicted_major =="Neurons" &
    sex == 'F'
)

I need to perform the shuffling and splitting on data_female.

Thank you in advance.

Please note: The goal is to feed this into a machine learning algorithm and is unrelated to the shuffle attribute in DimPlot().

saketkc commented 1 year ago

You can shuffle the list of cells, split this into test/train set and then pass it to the subset function to create subsets of test/train objects.

shuffled_cells <- sample(Cells(data_female))
train_cells <- shuffled_cells[1:10]
test_cells <- shuffled_cells[11:20]

train_object <- subset(data_female, cells = train_cells)
test_object <- subset(data_female, cells=test_cells)