satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.26k stars 907 forks source link

Question: Determine the optimal number of K clusters using K means method #694

Closed bwang258 closed 6 years ago

bwang258 commented 6 years ago

Hi,

I encountered Seurat package for single-cell RNA-Seq analysis. In addition to the tutorial, I am also interested in doing K-Means clustering on my 10X Genomics single-cell RNA seq data. I noticed that there are functions associated with K-Means clustering, DoKMeans(), where I need to specify the number of clusters. Generally, we often use the elbow method and the silhouette method to determine the optimal number of clustering in K-Means clustering, I am wondering if I can do the same using Seurat. After digging around, I am not sure where to pull the data from in order to compute the Total Within Sum of Squares on each K value. Could you please enlighten me on this? Thank you so much!

BC

bwang258 commented 6 years ago

Hi,

I just figured it out. Obviously, the Total Within Sum of Squares is in the data, as data@kmeans@cell.keams.obj$tot.withinss or data@kmeans@cell.keams.obj$tot.withinss, depending on if the K-means clustering is centered on genes or cells. Sorry for not digging deep enough before posting.

denvercal1234GitHub commented 3 years ago

@bwang258 - Thank you for your post! Would it be possible if you could provide the code on how you do it with the Seurat objects for the silhouette method? Thank you again!!