SCT on Multiple Reference Samples

AAA-3 commented 11 months ago

I have snATAC+GEX from 10X Genomics from one batch of 2 cell lines and 4 genotypes each (WT, negative control, and 2 disease) , Giving me 8 samples. Some had multiple runs, making 10 samples.

I already merged the raw datasets just so I could do some quick QC checks and plan to split the object for their independent SCT ( (#7407 #2814 ) prior to integration. What I want to know is:

Does this mean that I will have 10 different SCT assays in my merged file? (#5183 )
The tutorial for SCT v2 uses control for the UMAP and then integrates the second sample to the control. In my case, would it be best to keep my 2 WT merged together as the "control" Biologically, to me this would make sense but I do not know if this would be correct mathematically...

Based on older closed issues, I will then use these SCT assays (normalized separately on each sample before integration), for PrepSCTIntegration and PCA, clustering etc. then switch back to the RNA assay (to NormalizeData() and ScaleData()) for visualizations and marker detection ( #2023 and #1836 #6205 ) - I will try the the SCT data for DEGs #2180

saketkc commented 10 months ago

I would recommend running SCTransform individually in which case you will have 10 models and then run integration (as shown in the SCT v2 vignette). For keeping your 2 WTs as controls, you can use reference based integration as shown in this vignette (Seuratv4): https://satijalab.org/seurat/archive/v4.3/integration_large_datasets

AAA-3 commented 10 months ago

Thanks for this -

just to double check on the code:

My SCT datasets have already gone through the normalisation, scaling step through SCTransform. I ran PCA on all of them.

According to the instructions you linked and adapting them to what I have from SCT, I would need to: create a list of my objects:

samples.list <- list(h6_QC = h6_QC,
                     ...
                     object10 = Object10)

To this list I would then run:

anchors <- FindIntegrationAnchors(object.list = sample.list, reference = c("h6_QC", "h9_QC"), reduction = "rpca",
    dims = 1:50)
RNA_multiome <- IntegrateData(anchorset = anchors, dims = 1:50)
RNA_multiome <- ScaleData(RNA_multiome, verbose = FALSE)
RNA_multiome <- RunPCA(RNA_multiome, verbose = FALSE)
RNA_multiome <- RunUMAP(RNA_multiome, dims = 1:50)

My question is:

Since I am using SCT data, must I still scale the combined dataset and rerun PCA?

satijalab / seurat

SCT on Multiple Reference Samples #8139