satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.31k stars 920 forks source link

Excess singletons identified from 2.4M cells #9308

Open Enterprise-D opened 2 months ago

Enterprise-D commented 2 months ago

Hi!

I am currently running FindClusters() on a Nanostring 2.4M dataset. Ideally there should be ~10 major clusters but over 120K communities was firstly identified, with most of the are singletons (I tried algorithm=c(1, 2, 3), and Leiden never finishes). I think this much more than expected as singletons account for ~5% of total cells.

Additionally, the merge of singletons seems to be very slow and after days of computation, it still hasn't stopped yet.

Any alternative solution to work around assuming that we have enough RAM? I tried sketch-based workflow but it will result in disagreement between UMAP from the "projected full PCA" and projected cell types.

Many thanks,

zskylarli commented 1 month ago

Hi - thank you for posting this issue! What do you mean by "I tried sketch-based workflow but it will result in disagreement between UMAP from the "projected full PCA" and projected cell types."? I think in this scenario, the best method would be the sketch-based workflow, so if you can attach the code that you used for performing the sketching, clustering, and projection, and also examples of the exact issues you are encountering, I can help look into it further! Otherwise, you can also try making the resolution parameter of clustering very low, such as resolution = 0.1, to reduce the number of communities identified, although this may still not work well with the large number of cells. Thanks!