Conflicting guide about using RNAssay vs SCT Assay for DEG in Integration workflow tutorials?

satijalab / seurat

R toolkit for single cell genomics

http://www.satijalab.org/seurat

Other

2.25k stars 904 forks source link

Conflicting guide about using RNAssay vs SCT Assay for DEG in Integration workflow tutorials? #5738

Closed denvercal1234GitHub closed 2 years ago

denvercal1234GitHub commented 2 years ago

Hi there,

In the Introduction to SCTransform, v2 regularization tutorial (https://satijalab.org/seurat/articles/sctransform_v2_vignette.html), FindMarkers(setting assay = "SCT").

But, in the Introduction to scRNA-seq integration tutorial (https://satijalab.org/seurat/articles/integration_introduction.html#perform-an-integrated-analysis-1) under the Identify conserved cell type markers section, it says "# For performing differential expression after integration, we switch back to the original data" by setting DefaultAssay(immune.combined) <- "RNA"

Would you mind clarifying if we should set active assay to RNA before performing FindMarkers()/FindAllMarkers() for SCTransform-ed objects in integration workflow and in "non-integration" workflow (https://satijalab.org/seurat/articles/sctransform_vignette.html)?

Thank you very much for your help!

saketkc commented 2 years ago

In the V2 model, we do the DE on corrected counts (after additionally adjusting for library size differences across models) so you can use the data slot to perform DE if using the v2 model - we also show this in the vignette. It is tricky to perform DE on pearson residuals themselves (and hence the earlier vignettes recommended moving to RNA assay).

denvercal1234GitHub commented 2 years ago

Thank you @saketkc . But then in the V2 model, should we visualize gene expression such as in DotPlot, DoHeatmap, VlnPlot, etc. using the data slot of the SCT as well, and no need any longer to switch to the RNAssay for visualization? Thanks again for your response!

denvercal1234GitHub commented 2 years ago

Answer from the vigette:

"We can also use the corrected counts for visualization:"

denvercal1234GitHub commented 2 years ago

Hi @saketkc - Sorry to raise another related question. Just so I understand correctly, with SCTransform-ed (v2) data, could we also perform AverageExpress() using the data slot of the SCT Assay, and no longer needing to do so on RNA data? Thank you again for your patience and advice.

saketkc commented 2 years ago

Yes, that is correct. If you have multiple datasets, you should run PrepSCTFindMarkers() so that the correct counts are calculated using the same median depth.

denvercal1234GitHub commented 2 years ago

Hi @saketkc, I am a bit unclear on the steps. My goal is to merge the 2 datasets for clustering. Would you mind letting me know if the steps below make sense? For now, I only use the "list" of the two objects to get the variable genes, but I used the merged object for everything else. Thank you so much!

I have 2 objects that I separately SCTransform-ed(v2). Then, I merged them (setting merge.data = T) Made a list of these 2 objects Perform SelectIntegrationFeatures of the list -> variable_genes Set VariableFeatures(merged object) <- variable_genes Finally, I performed RunPCA, etc for clustering analysis using the merged object.

To perform AverageExpression, just use the merged object (SCT Assay data slot). To perform DEG, do PrepSCTFindMarkers(of the merged object) and then FindMarker(of merged object and not the list of the 2 objects)

saketkc commented 2 years ago

looks almost correct, but better to run PrepSCTFindMarkers() before running average expression - this will ensure the average is calculated using the same library size (same motivation as in DE).

denvercal1234GitHub commented 2 years ago

Thank you again for your help!