hemberg-lab / SC3

A tool for the unsupervised clustering of cells from single cell RNA-Seq experiments
http://bioconductor.org/packages/SC3
GNU General Public License v3.0
119 stars 55 forks source link

Clustering for more than 10000 Genes #93

Closed hkarakurt8742 closed 5 years ago

hkarakurt8742 commented 5 years ago

Hello, As I know for data sets which contain more than 5000 cells, SC3 uses a hybrid approach (SVM). If a data set contains more than 10000 (let's say 12000), are randomly selected 5000 cells enough for this approach? Because randomly selected cells may not involve some cell types in this situation (I am not sure about probability) so with SVM, these cells will assigned to only cell which are used for training data.

wikiselev commented 5 years ago

Sorry, are you asking about genes or cells, it's not clear from your question? SC3 does not select random genes, but it does select random cells.

hkarakurt8742 commented 5 years ago

Sorry, are you asking about genes or cells, it's not clear from your question? SC3 does not select random genes, but it does select random cells.

I am sorry I wrote it wrong. I edited it. I meant cells not genes.

wikiselev commented 5 years ago

Yes, this is correct, using the SC3 approach you may miss some rare cell type. For a large number of cells we recommend using other clustering tools such as Seurat or scanpy.