satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.29k stars 915 forks source link

SCT on Multiple Reference Samples #8139

Closed AAA-3 closed 10 months ago

AAA-3 commented 11 months ago

I have snATAC+GEX from 10X Genomics from one batch of 2 cell lines and 4 genotypes each (WT, negative control, and 2 disease) , Giving me 8 samples. Some had multiple runs, making 10 samples.

I already merged the raw datasets just so I could do some quick QC checks and plan to split the object for their independent SCT ( (#7407 #2814 ) prior to integration. What I want to know is:

Based on older closed issues, I will then use these SCT assays (normalized separately on each sample before integration), for PrepSCTIntegration and PCA, clustering etc. then switch back to the RNA assay (to NormalizeData() and ScaleData()) for visualizations and marker detection ( #2023 and #1836 #6205 ) - I will try the the SCT data for DEGs #2180

saketkc commented 10 months ago

I would recommend running SCTransform individually in which case you will have 10 models and then run integration (as shown in the SCT v2 vignette). For keeping your 2 WTs as controls, you can use reference based integration as shown in this vignette (Seuratv4): https://satijalab.org/seurat/archive/v4.3/integration_large_datasets

AAA-3 commented 10 months ago

Thanks for this -

just to double check on the code:

My SCT datasets have already gone through the normalisation, scaling step through SCTransform. I ran PCA on all of them.

According to the instructions you linked and adapting them to what I have from SCT, I would need to: create a list of my objects:

samples.list <- list(h6_QC = h6_QC,
                     ...
                     object10 = Object10)

To this list I would then run:

anchors <- FindIntegrationAnchors(object.list = sample.list, reference = c("h6_QC", "h9_QC"), reduction = "rpca",
    dims = 1:50)
RNA_multiome <- IntegrateData(anchorset = anchors, dims = 1:50)
RNA_multiome <- ScaleData(RNA_multiome, verbose = FALSE)
RNA_multiome <- RunPCA(RNA_multiome, verbose = FALSE)
RNA_multiome <- RunUMAP(RNA_multiome, dims = 1:50)

My question is:

Since I am using SCT data, must I still scale the combined dataset and rerun PCA?