satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.29k stars 915 forks source link

WNN Clustering resulting in hundreds of clusters, clusters not ordered by size #7937

Closed majohn40 closed 1 year ago

majohn40 commented 1 year ago

TLDR; Why is cluster 344 larger than cluster 6? I thought they were assigned cluster numbers based on cell count.

I am working with a very large (180k+) scRNA+scATAC multiome dataset. After clustering the data in WNN space using FindClusters(mydata, graph.name = "wsnn", algorithm = 3, resolution=0.01), I am getting hundreds of clusters as described previously (https://github.com/satijalab/seurat/discussions/5427). The majority of these clusters have on average around 5 cells. However, cluster 344 has almost 7k cells. The count of cells per cluster is shown below.

Cluster | Number of Cells --|-- 0 | 87039 1 | 35706 2 | 18318 3 | 8267 4 | 8000 5 | 7058 344 | 6962 6 | 5155 7 | 1540 8 | 344 9 | 202

I was under the impression that the cluster numbers were assigned by the number of cells assigned to each cluster, where 0 is the cluster with the greatest number of cells- is that incorrect? Additionally, I was using algorithm 3 based on the WNN scRNA+scATAC vignette (https://satijalab.org/seurat/articles/weighted_nearest_neighbor_analysis#wnn-analysis-of-10x-multiome-rna-atac). Was there a reason that algorithm 3 was selected for this analysis over the other options?

igrabski commented 1 year ago

Hi Megan, you are correct that cluster numbers are assigned in order by cluster size. However, after initial cluster assignments, any singletons that are present are then re-assigned to other larger clusters. I would guess that in this case, a very large number of singletons were assigned to cluster 344, and as a result, it ended up being one of the larger clusters despite starting off much smaller.