satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.2k stars 894 forks source link

scRNA-seq mapping to multiome reference built from 5 multiome samples #6537

Closed myylee closed 1 year ago

myylee commented 1 year ago

Hi Seurat team,

I generated a multiome reference dataset composed of 5 multiome samples. I followed the suggestions from (satijalab/seurat#5346) in which 5 RNA samples are first integrated, then the 5 ATAC samples, and lastly RNA and ATAC modalities are combined via WNN.

Now, I want to map a separate scRNA-seq dataset onto the multiome reference to obtain cell type annotation. My thought is to use the supervised PCA approach, but since my multiomeRNA was scTransformed individually, therefore, should I first run PrepSCTFindMarkers to standardize the separately fitted SCT models, and then apply RunSPCA to learn a transformation from MultiomeRNA SCT corrected counts to WNN loadings? The codes shown below are what I plan to run to map scRNA-seq to the multiome reference. Could you kindly let me know if you see any problem with this approach?

integreated_multiome <- PrepSCTFindMarkers(integreated_multiome, assay = "SCT")
integreated_multiome <- RunSPCA(integreated_multiome, 
                                assay = 'SCT', 
                                graph = 'wsnn',
                                features = integreated_multiome@assays$integrated@var.features # features used to integrate the RNA datasets 
                               )

anchors <- FindTransferAnchors(reference = integreated_multiome, query = seurat, normalization.method = "SCT", reference.reduction = "spca")

seurat <- MapQuery(anchorset = anchors, 
                   query = seurat, reference = integreated_multiome,
                   refdata = list(celltype = "celltype"), 
                   reference.reduction = "spca", 
                   reduction.model = "wnn.umap"
                   )

Greatly appreciate your feedback, thanks!

Best regards, Michelle

cedmo001 commented 1 year ago

Hi Michelle,

I'm having trouble following the advice from #5346. Would you mind posting your script you used to integrate the 5 multiome samples? No worries if not :)

Best Wishes, Carina

myylee commented 1 year ago

Hi Michelle,

I'm having trouble following the advice from #5346. Would you mind posting your script you used to integrate the 5 multiome samples? No worries if not :)

Best Wishes, Carina

# RNA
obj.list <- SplitObject(combined, split.by = "donor")

rna.list <- lapply(X = obj.list, function(x){
    CreateSeuratObject(x@assays$RNA@counts,meta.data = x@meta.data) %>% 
        SCTransform(vst.flavor = "v2", verbose = FALSE)%>%
        RunPCA(npcs = 30, verbose = FALSE)
})

features <- SelectIntegrationFeatures(object.list = rna.list, nfeatures = 3000)
rna.list <- PrepSCTIntegration(object.list = rna.list, anchor.features = features)

anchors.sct.norm <- FindIntegrationAnchors(object.list = rna.list, 
                                           normalization.method = "SCT",
                                           anchor.features = features)
combined.sct <- IntegrateData(anchorset = anchors.sct.norm, normalization.method = "SCT")

combined.sct <- RunPCA(combined.sct, verbose = FALSE)
combined.sct <- RunUMAP(combined.sct, reduction = "pca", dims = 1:30,seed.use = 1234)
combined.sct <- FindNeighbors(combined.sct, reduction = "pca", dims = 1:30)
set.seed(1234)
combined.sct <- FindClusters(combined.sct, resolution = 0.3)

combined.sct.consistent <- PrepSCTFindMarkers(combined.sct,assay = 'SCT')

# ATAC
obj.list <- SplitObject(combined, split.by = "donor")
atac.list <- lapply(X = obj.list, function(x){
    peak_keep = which(rowSums(x@assays$ATAC@counts)>0)
    CreateSeuratObject(x@assays$ATAC@counts[peak_keep,],meta.data = x@meta.data,assay = 'ATAC') %>% 
        FindTopFeatures(min.cutoff = 10) %>%
        RunTFIDF() %>% 
        RunSVD()
})
# merge
atac.combined<- atac.list[[1]]
for(i in 2:length(atac.list)){
    atac.combined <- merge(atac.combined,atac.list[[i]])
}
atac.combined <- FindTopFeatures(atac.combined, min.cutoff = 5)
atac.combined <- RunTFIDF(atac.combined)
atac.combined <- RunSVD(atac.combined)

# compute LSI
# find integration anchors
integration.anchors <- FindIntegrationAnchors(
  object.list = atac.list,
  reduction = "rlsi",
  dims = 2:30
)

# integrate LSI embeddings
atac.integrated <- IntegrateEmbeddings(
  anchorset = integration.anchors,
  reductions = atac.combined[["lsi"]],
  new.reduction.name = "integrated_lsi",
  dims.to.integrate = 1:30
)
atac.integrated <- RunUMAP(atac.integrated, 
                           reduction = 'integrated_lsi', 
                           dims = 2:30, 
                           reduction.name = "umap.atac", 
                           reduction.key = "atacUMAP_",
                           seed.use = 1234)

# Combine RNA and ATAC
integreated_multiome <- combined.sct.consistent
integreated_multiome[["ATAC"]]<- atac.combined[['ATAC']]
# Two reductions: integrated_lsi and umap.atac are added
integreated_multiome@reductions<- append(combined.sct.consistent@reductions,atac.integrated@reductions)
# WNN
integreated_multiome <- FindMultiModalNeighbors(integreated_multiome, reduction.list = list("pca", "integrated_lsi"), dims.list = list(1:30, 2:30))
integreated_multiome <- RunUMAP(integreated_multiome, 
                                nn.name = "weighted.nn", 
                                reduction.name = "wnn.umap", 
                                reduction.key = "wnnUMAP_",
                                return.model=TRUE,
                                seed.use=1234)

# Supervised projection
integreated_multiome <- RunSPCA(integreated_multiome, 
                                assay = 'SCT', 
                                graph = 'wsnn',
                                features = integreated_multiome@assays$integrated@var.features
                               )

Here is how I'm currently running the integration. I had the multiome samples all stored in one Seurat object originally. So the first step for each modality was to split them into individual Seurat object by donor.

Hope it helps. Let me know if you see any problem too.

Best regards, Michelle

yuhanH commented 1 year ago

hi @myylee Actually, you don't need to run PrepSCTFindMarkers because RunSPCA will take residuals data as the input. But I suggest you use integrated residuals to perform RunSPCA instead of SCT assay. Also, thank you for posting scripts for multiome integration.

yojetsharma commented 4 months ago

Hi Michelle, I'm having trouble following the advice from #5346. Would you mind posting your script you used to integrate the 5 multiome samples? No worries if not :) Best Wishes, Carina

# RNA
obj.list <- SplitObject(combined, split.by = "donor")

rna.list <- lapply(X = obj.list, function(x){
    CreateSeuratObject(x@assays$RNA@counts,meta.data = x@meta.data) %>% 
        SCTransform(vst.flavor = "v2", verbose = FALSE)%>%
        RunPCA(npcs = 30, verbose = FALSE)
})

features <- SelectIntegrationFeatures(object.list = rna.list, nfeatures = 3000)
rna.list <- PrepSCTIntegration(object.list = rna.list, anchor.features = features)

anchors.sct.norm <- FindIntegrationAnchors(object.list = rna.list, 
                                           normalization.method = "SCT",
                                           anchor.features = features)
combined.sct <- IntegrateData(anchorset = anchors.sct.norm, normalization.method = "SCT")

combined.sct <- RunPCA(combined.sct, verbose = FALSE)
combined.sct <- RunUMAP(combined.sct, reduction = "pca", dims = 1:30,seed.use = 1234)
combined.sct <- FindNeighbors(combined.sct, reduction = "pca", dims = 1:30)
set.seed(1234)
combined.sct <- FindClusters(combined.sct, resolution = 0.3)

combined.sct.consistent <- PrepSCTFindMarkers(combined.sct,assay = 'SCT')

# ATAC
obj.list <- SplitObject(combined, split.by = "donor")
atac.list <- lapply(X = obj.list, function(x){
    peak_keep = which(rowSums(x@assays$ATAC@counts)>0)
    CreateSeuratObject(x@assays$ATAC@counts[peak_keep,],meta.data = x@meta.data,assay = 'ATAC') %>% 
        FindTopFeatures(min.cutoff = 10) %>%
        RunTFIDF() %>% 
        RunSVD()
})
# merge
atac.combined<- atac.list[[1]]
for(i in 2:length(atac.list)){
    atac.combined <- merge(atac.combined,atac.list[[i]])
}
atac.combined <- FindTopFeatures(atac.combined, min.cutoff = 5)
atac.combined <- RunTFIDF(atac.combined)
atac.combined <- RunSVD(atac.combined)

# compute LSI
# find integration anchors
integration.anchors <- FindIntegrationAnchors(
  object.list = atac.list,
  reduction = "rlsi",
  dims = 2:30
)

# integrate LSI embeddings
atac.integrated <- IntegrateEmbeddings(
  anchorset = integration.anchors,
  reductions = atac.combined[["lsi"]],
  new.reduction.name = "integrated_lsi",
  dims.to.integrate = 1:30
)
atac.integrated <- RunUMAP(atac.integrated, 
                           reduction = 'integrated_lsi', 
                           dims = 2:30, 
                           reduction.name = "umap.atac", 
                           reduction.key = "atacUMAP_",
                           seed.use = 1234)

# Combine RNA and ATAC
integreated_multiome <- combined.sct.consistent
integreated_multiome[["ATAC"]]<- atac.combined[['ATAC']]
# Two reductions: integrated_lsi and umap.atac are added
integreated_multiome@reductions<- append(combined.sct.consistent@reductions,atac.integrated@reductions)
# WNN
integreated_multiome <- FindMultiModalNeighbors(integreated_multiome, reduction.list = list("pca", "integrated_lsi"), dims.list = list(1:30, 2:30))
integreated_multiome <- RunUMAP(integreated_multiome, 
                                nn.name = "weighted.nn", 
                                reduction.name = "wnn.umap", 
                                reduction.key = "wnnUMAP_",
                                return.model=TRUE,
                                seed.use=1234)

# Supervised projection
integreated_multiome <- RunSPCA(integreated_multiome, 
                                assay = 'SCT', 
                                graph = 'wsnn',
                                features = integreated_multiome@assays$integrated@var.features
                               )

Here is how I'm currently running the integration. I had the multiome samples all stored in one Seurat object originally. So the first step for each modality was to split them into individual Seurat object by donor.

Hope it helps. Let me know if you see any problem too.

Best regards, Michelle

Hello, Thank you for sharing this. I have a doubt: I have 4 samples (different donors) and for each 10x mutliome was carried out. I have been using the merge function to keep them together for my downstream analyses. What I really want to get at though is to first annotate them using another reference. How do you think I can about it?