satijalab / sctransform

R package for modeling single cell UMI expression data using regularized negative binomial regression
GNU General Public License v3.0
203 stars 33 forks source link

combining incrementally available data #149

Closed e-manduchi closed 1 year ago

e-manduchi commented 1 year ago

Hello SCTransform and Seurat Team,

I'd appreciate your advice on a special situation we are in. We have an established reference (from multiome data) that has allowed us to annotate cell types from a particular tissue.

We are now incrementally getting new scRNA-seq data from the same tissue type from different individuals (sometimes months apart between groups of individuals). We are considering the following incremental approach. Process the scRNAseq data from each individual separately as they become available, obtaining its SCTransformed counts. At the same time we can annotate cells for this individual by mapping to our established reference.

Having the above, do you see any issue in doing DE analyses (e.g. for a given cell type between different conditions for these individuals) using as input the separately computed SCRTransformed counts? E.g. deriving pseudobulk counts for each individual from its SCTransformed counts and then inputting the pseudobulk counts into DESeq2 or similar programs, properly accounting for relevant covariates (sex, ethnicity, library reagent kit used, etc.) in our analyses. This is just an example; non-pseudobulk analyses could be an alternative. The main point though is that we wouldn't be doing any integration in addition to that done once for each individual when annotating its cell types using the reference.

Thanks for your attention

saketkc commented 1 year ago

Your workflow sounds reasonable with one caveat. SCTransform corrected counts are calculated by reversing the regression model by substituting median sequencing depth. If the sequencing depth is very different across samples, this would result in inflated false positives. In the SCT v2 vignette we show how to correct for such diferences by using PrepSCTFindMarkers() which will readjust corrected counts to be at the same sequencing depth across samples. You can then use standard DE methods (wilcox/LR/mast) on the corrected counts.

e-manduchi commented 1 year ago

Thank you so much for the advice! Elisabetta