satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.27k stars 910 forks source link

Hierarchical Clustering for scRNA-seq data: BuildClusterTree vs. hclust #3754

Closed AlasdairCUPEI closed 3 years ago

AlasdairCUPEI commented 3 years ago

Hi everyone, I have been trying to use hclust to perform hierarchical clustering on my scRNA-seq data, but I've been running into some issues:

  1. Is "BuildClusterTree" generally considered to be a good/reputable command for performing hierarchical clustering? Are there any known issues that would make it an inferior choice to Monocle, hclust, or another program?
  2. How does the procedure used by "BuildClusterTree" to establish a hierarchy differ from that used by hclust or other programs, or are they generally considered to be equivalent?
  3. Not so much a question, but a comment - I noticed that BuildClusterTree does not really seem to be advertised by Seurat, as I don't recall seeing it in any of the vignettes, and stumbled upon it by chance. As hierarchical clustering seems to be a standard procedure when performing scRNA-seq analysis, I would have assumed that it would have been featured somewhere.

As mentioned, I am aware of hclust, and I am currently trying to set up Monocle. Are there any other recommendations for good programs written in R that perform hierarchical clustering?

Finally, I was wondering about using a statistical metric such as a Pearson (or Bayesian) correlation to compare how closely related some clusters are. Would a Pearson or Bayesian correlation be an acceptable statistic to use when examining scRNA-seq data, or are there others that should be used instead? If so, I was also wondering what the proper procedure would be to correctly test this using scRNA-seq data.

Thank you very much in advance! I apologize for the length of this post.

satijalab commented 3 years ago

BuildClusterTree was meant to perform hierarchical clustering on the pseudobulk averages of different clusters, to understand the potential hierarchical relationships between them. We do not run hierarchical clustering on the single-cells .

You can use this function as a shortcut to calculating pseudobulk cluster averages, and then running hclust.