satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.24k stars 902 forks source link

How do I analysis findmarkers and DEGs after integration with SCTtrasnform #3839

Closed 0717cyj closed 3 years ago

0717cyj commented 3 years ago

Dear SatijaLAB Hello. I have some question about analysis of DEG (findmarker etc.) after integration with SCT.

I made a seurat object from 3 different data set with method of integration with SCTtranform. (vignettes from Satija lab, and anchor features were 3,000; pancreas.features <- SelectIntegrationFeatures(object.list = pancreas.list, nfeatures = 3000) pancreas.list <- PrepSCTIntegration(object.list = pancreas.list, anchor.features = pancreas.features, verbose = FALSE) )

In this situation, integrated objects contains [["SCT"]]@counts, [["SCT"]]@data, [["SCT"]]@scale.data.

And, if proceeded to clusetering and other DEG analysis, in principle, you recommended that it would be most optimal to perform these calculations directly on the residuals (stored in the scale.data slot) (https://satijalab.org/seurat/v3.0/sctransform_vignette.html)

However, after integration with SCTransform by 3,000 anchor features, [["SCT"]]@scale.data. has only 3000 features. In this situation, Which is the optimal data for analysis of DEG and finding markers in [["SCT"]]@counts, [["SCT"]]@data, or [["SCT"]]@scale.data?

And, to perform DEG analysis with [["SCT"]]@scale.data, What additional work do I need to do? Should I back to the integration, and change the integration anchor features to number of all RNA features in my object?

This is a small but important matter that I have encountered, so I ask question to Satija LAB.

Sincerely regards Yong Jun, Choi

jaisonj708 commented 3 years ago

Most DE methods use either raw counts or normalized data, not scaled data. (You also should not run DE on integrated data.) I would recommend running DE by specifying assay=RNA in FindMarkers, using whichever test.use you prefer. The correct slot will automatically be selected for you.

jgamache014 commented 3 years ago

Hello @0717cyj - I ran into the same question. Based on a previous issue here, I'd recommend using the return.only.var.genes = FALSE argument when running the SCTransform() function. This should increase the number of features in object[["SCT"]]@scale.data beyond 3,000.