Closed browaeysrobin closed 11 months ago
There are three main reasons why this might happen. The first is that both scSHC
and testClusters
give stochastic results, due to the randomness involved in simulating the null distribution, so minor differences in output are possible based on the seed. However, the more likely reasons are that testClusters
can be thought of as a more approximate procedure than scSHC
. Whereas scSHC
performs hierarchical clustering on cells and then tests clusters by proceeding down the tree, testClusters
begins by performing hierarchical clustering on pseudobulked profiles of the clusters. This could potentially result in testing clusters in a different order from scSHC
, which could then subsequently yield different results. Finally, when generating the empirical null distribution in scSHC
, we apply the same hierarchical clustering procedure to each simulated null dataset. However, in testClusters
, because we don't know what clustering procedure was used to create the original clusters, we use a nearest neighbors approach to define clusters in each null dataset, which could result in somewhat different clusters and therefore a different empirical null distribution of the clustering test statistic. So, while testClusters
and scSHC
would be perfectly consistent in an ideal world, the approximations required for testClusters
could result in discrepancies like you observed.
Hi @igrabski
Thank you for the clear explanation!
Hi @igrabski
I have some questions about the workings of
scSHC::testClusters
compared toscSHC::scSHC
. I would expect that applying the significance test approach implemented byscSHC::testClusters
on clusters obtained withscSHC::scSHC
would return the same clusters and not merge clusters together. However, for two datasets I tried it on, some scSHC-defined clusters were merged after running testClusters. Could you help me explain how this could be possible?I used the following code to run this test:
clusters_scSHC <- scSHC::scSHC(seuratObj@assays$RNA@counts, alpha = 0.05, num_features = 2500, num_PCs = 30, parallel = T, cores = 3)
cluster_significance_test <- scSHC::testClusters(seuratObj@assays$RNA@counts,cluster_ids=clusters_scSHC[[1]], alpha = 0.05, num_features = 2500, num_PCs = 30, parallel = T, cores = 3)
table(cluster_significance_test[[1]],clusters_scSHC[[1]])