Closed Dario-Rocha closed 12 months ago
Thanks for the suggestion! There are statistical advantages to operating on raw counts, but it is true that as a result, our methods are limited in how well they can interface with integration/batch-correction pipelines. There is actually some recent work outside of the single-cell context that has proposed a method that operates on lower-dimensional embeddings, which might be useful to check out!
I think this is a very interesting proposition. I would like to implement your clustering method and/or your evaluation of cluster certainty on integrated datasets for which I have the batch-corrected PCA embeddings but not a batch-corrected expression matrix (because of method limitations and the sheer size such data would have). Even if it was possible to use a batch-correction method which provides batch-corrected gene expression, I would like to keep the integration and batch correction I've performed unchanged, therefore I wonder if your package could be extended to work with the batch-corrected PCA emeddings as starting point, instead of the gene expression matrix.