igrabski / sc-SHC

Significance analysis for clustering single-cell RNA-sequencing data
87 stars 10 forks source link

scSHC() and testClusters with PCs as input instead of gene expression #9

Closed Dario-Rocha closed 12 months ago

Dario-Rocha commented 1 year ago

I think this is a very interesting proposition. I would like to implement your clustering method and/or your evaluation of cluster certainty on integrated datasets for which I have the batch-corrected PCA embeddings but not a batch-corrected expression matrix (because of method limitations and the sheer size such data would have). Even if it was possible to use a batch-correction method which provides batch-corrected gene expression, I would like to keep the integration and batch correction I've performed unchanged, therefore I wonder if your package could be extended to work with the batch-corrected PCA emeddings as starting point, instead of the gene expression matrix.

igrabski commented 12 months ago

Thanks for the suggestion! There are statistical advantages to operating on raw counts, but it is true that as a result, our methods are limited in how well they can interface with integration/batch-correction pipelines. There is actually some recent work outside of the single-cell context that has proposed a method that operates on lower-dimensional embeddings, which might be useful to check out!