Closed DarioS closed 10 months ago
Hi Dario, do these data have any batch effect / multi-sample structure? If so, did you run clustering with scSHC
or with testClusters
?
I don't think so. Supplementary Figure 1e is "UMAP of CAF type assignment from raw, uncorrected scRNA-Seq data." and the clusters seem homogeneous to me, suggesting a lack of batch effect. I used function scSHC
. Can you reproduce the result?
Hmm, I was unable to reproduce your result. If you share your code (and especially your seed if you set one), I can take a look? I attached a screenshot of what I ran and the output:
Sorry for the confusion. It happens with BREAST_fibro_tumour.rds
, not with HNSCC_fibro_tumour.rds
.
Thanks for the clarification! I was able to reproduce your result. The number of clusters is reduced when specifying the batch label, but still seems higher than would be expected. I have not yet fully investigated these data, but the large number of clusters makes me suspect that perhaps the assumptions of the method do not hold here. In particular, we assume that within clusters, genes follow a unimodal distribution (specifically, Poisson-log normal); however, if there are cells where genes actually follow a multimodal distribution within a true cluster, then we will find too many clusters.
I applied
scSHC
to the counts in the fileHNSCC_fibro_tumour.rds
contained in scRNA-seq_dataobjects.zip from a recent journal article and it estimated 116 clusters on default parameters, which is far too many.