Closed xiyupeng closed 2 years ago
For the clustering algorithm I would refer you to https://satijalab.org/seurat/reference/findclusters or https://www.biostars.org/p/445075/ or https://github.com/satijalab/seurat/issues/3038 . Using multiple cores can help speed things up. Probably the fastest would be using GPU based clustering algorithms that are out there.
Hello, Seurat developers !
I am working on flow data, which are single cell data with only dozens of markers. I want to do clustering on a dataset with 10M+ cells, but I currently test the pipeline on small dataset. I plan to use the default Louvain method for clustering and below are my parameter setting.
It works well on a subset of 1M cells. The FindClusters() is the most time consuming step ( I am so surprised) and it takes about 2 hours with 1M cells. But when it comes to 3M cells, it already runs about 4 days. UMAP takes about 2.5 hours on the same 3M subset. For test, I just use single core and about 40G memory. Below is the output of Louvain method.
I am new to Seurat. I wonder which implementation of Louvain method used in the Seurat package and is it scalable to millions of cells ? Previous I thought the SNN graph is the most time and memory consuming step but I was wrong. Do you recommend other scalable clustering algorithm that could be applied to millions of cells ?
Thank you !
Best, Xiyu