Normalized count data - Githubissues

aaliya1997 commented 1 month ago

Hi, I am working on Crohn's disease single cell dataset. For my scRNASeq I am using the normalized count matrix. MY question is that is the normalized data is a preprocessed data or is it just log normalized raw counts. I need to follow the pipeline in the paper which says doublet removal using Scrublet i.e a python based tool, batch correction using Harmony algorithm from Cumulus. For the reference I am working with the seurat package for my analysis part and simply following the basic steps of loading the data, doing PCA, clustering, annotation and then my downstream analysis. Will my pipeline be wrong with the normalized data or do I have to follow the doublet removal, batch correction. etc

ondina-draia commented 1 month ago

I guess if it was not already done before, you should remove the doublets and do the basic filtering steps, like erasing cells with too many mithocondria RNA, etc, before proceeding with the rest of the pipeline. You can use DoubletFinder in R for doublets identification.

mhkowalski commented 1 month ago

Hi,

This is very dependent on your goals and the data are you using. If you hope the reproduce the analysis in the paper as closely as possible, I'd recommend following the paper's methods as closely as possible.

If there are a substantial number of doublets in this data, I would definitely recommend removing the doublets via a doublet removal tool (or could manually inspect for co-expression of genes that are canonical markers for different populations). Same goes for batch correction- if there are substantial batch effects in the data, you likely need to do some sort of batch correction. There isn't a one-size fits all solution for single-cell analysis, so it's difficult for me to provide a more definitive answer.

satijalab / seurat

Normalized count data #9096