satijalab / sctransform

R package for modeling single cell UMI expression data using regularized negative binomial regression
GNU General Public License v3.0
203 stars 33 forks source link

Data visualization and differential gene expression after SCT normalization and integration #151

Closed yuyuliu closed 1 year ago

yuyuliu commented 1 year ago

Hi, I know that similar questions have been asked many times, and I swear I read most of the issues and answers, but I am still confused in some points.

First, I understood that SCT performs normalization and returns the SCT assay containing centered Pearson residuals in scale.data slot, that were then converted to sequencing depth corrected UMI counts stored in counts slot, and these counts were log-normalized and stored in data slot.

After I performed integration of 3 SCT normalized data, I obtain a new assay named "integrated". This assay also contains a scale.data slot and data slot.

So here, if I want to visualize the normalized data and compare expression levels between different samples, should I use the scale.data slot or data slot? from the SCT assay or integrated assay? Most of the issues and answers that I found said that these kind of analysis will be able on the upcoming Seurat v3. But the current version is v4, and you provided a vignette applying FindMarkers on the data slot of SCT assay for identification of differentially expressed genes. Is this the right way to perform such analysis?

I compared the violin plots obtained from 1) scale.data of SCT assay, 2) data of SCT assay, 3) scale.data of integrated assay, and 4)data of integrated assay for a same gene, but the 4 plots look completely different. I am so confused. ex

saketkc commented 1 year ago

You should use the data slot of the SCT assay for visualization - it contains log1p(corrected counts) that has been adjusted for differences in sequencing depth.