satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.24k stars 902 forks source link

Integration of multiple groups of multiome data #5346

Closed KoichiHashikawa closed 2 years ago

KoichiHashikawa commented 2 years ago

Hello Satija group,

I really appreciate all the members in Satija group for significant contributions to the community for the last many years.

We recently acquired 3 experimental groups of multiome (10x, snRNA/ATAC from the same cells) and wanted to hear thoughts on how to integrate 3x2 samples (3 groups x 2 modalities).

In our previous publications, we have utilized Seurat V3 to integrate multiple groups of a single modality (RNA) by computing corrected expression data.

I am thinking of two different ways of integrations of 3x2 samples and I really appreciate if we could hear your expert opinions.

1) Integration of 3x RNA samples and 3x ATAC samples independently, using FindingIntegrationAnchors and IntegrateData to compute corrected data for RNA and ATAC data. Then, combine those data to generate a single merged object (having corrected RNA and ATAC data matrix). For corrected data in the merged object, reduce the dimensions of each modality via RunPCA or RunSVD. Calculate WNN by FindMultiModalNeighbors. Finally identify clusters by FindClusters.

2) Integration of 3xRNAsamples and 3x predicted RNA expression from ATAC samples together, using FindingIntegrationAnchors and IntegrateData to compute corrected RNA expression data. Using corrected RNA expression data ("integrated"), conducting ScaleData/RunPCA/FindNeighbors/FindClusters to identify integrated clusters.

In addition, do you have detailed codes used in the analysis of Figure 3-4 in Hao et al., Cell, 2021 paper? I can have a general sense in reading "PBMC CITE-seq datasets of HIV Vaccine Trials Network samples" section in the method, but it helps if the code is publicly available.

Thank you so much.

best, Koichi

dkasd commented 2 years ago

Hi Koichi, did you figure out the best way to do this merging? I have a similar problem. Also, I assume we also need to merge the peaks for the ATAC assay, if as we did, peaks were called independently for each group Thanks! dkasd

KoichiHashikawa commented 2 years ago

Hello Dkasd,

Not yet, and I agree that we need to merge the ATAC peaks from multiple groups. I am thinking of trying the post here https://satijalab.org/signac/articles/integrate_atac.html. Once it works, I will post the codes and figures here.

yuhanH commented 2 years ago

Hi @KoichiHashikawa You are in the right direction. The first option you proposed is exactly what we did in the manuscript: correct batch effects from single-modality data individually, and then perform the WNN analysis.

This ATAC integration vignette show you how to get rid of the batch-effects in the ATAC. In the end, you will get a dimensional reduction integrated_lsi. You can this one and pca from integrated RNA assay to FindMultiModalNeighbors.

JoreVW commented 2 years ago

Hi @KoichiHashikawa, I'm doing the same kind of analysis, 2-snRNA/snATAC multiome datasets to be integrated into one Seurat object. I too had the strategy to first merge RNA datasets, and ATAC datasets. And than merge those via WNN into one dataset. I'm having trouble with the final step and trying to figure it out now. But I just wanted to ask whether in the end this was indeed the way to analyse your data? Did it work this way for you?

KoichiHashikawa commented 2 years ago

@JoreVW Yes, I did the integration that way, and it worked in our hands.

francescasci91 commented 2 years ago

Hi, I'm trying your suggestions but I'm having problems. Basically I have 6 datasets of RNA and 4 of ATAC and I want to do a joint analysis. This is what I did.

1.Integrating RNA datasets using SCT transofrmation v2 method to normalize them 2.Integrating ATAC datasets using standard pipeline

The problems start when I try to merge them

I receive this warning “Attempting to merge an SCTAssay with another Assay type Converting all to standard Assay objects.” And then when I try to run PCA Error: Cannot find 'pca' in this Seurat object

Do you have any suggestion how to solve this??

Thanks Francesca

yuhanH commented 2 years ago

hi @francescasci91 The initial issue is about multiome data (RNA-ATAC paired measurement). It seems that your question is about integration of unpaired RNA and ATAC data. For the RNA-ATAC integration, if you have a multiome data from the same tissue, you can use our latest bridge integration method. If not, you can follow our CCA-based RNA-ATAC integration method.

tinggithub33 commented 2 years ago

Hello @yuhanH , as your note above "You are in the right direction. The first option you proposed is exactly what we did in the manuscript: correct batch effects from single-modality data individually, and then perform the WNN analysis.

This ATAC integration vignette show you how to get rid of the batch-effects in the ATAC. In the end, you will get a dimensional reduction integrated_lsi. You can this one and pca from integrated RNA assay to FindMultiModalNeighbors"

I did 1) integrated RNAseq 2) integrated ATACseq how to put 1) and 2) together to run FindMultiModalNeighbors? by merge? do you have scripts for these to share? it will be greatly appreciated if you could give any suggestions.

francescasci91 commented 2 years ago

Hi @tinggithub33 , I did following in part what written here: https://github.com/timoast/signac/discussions/438. You have to be sure to have multiome and that they have been analyzed with cell-ranger-arc. I did in this way: 1)I created a common peakset for every experiment, counting fragments and create a chromatin assay for each sample like reported here : (https://satijalab.org/signac/articles/merging.html) and here (https://satijalab.org/signac/articles/integrate_atac.html). Since the cells in the final multiome object should be the same, before creating each Chromatin assay object, I did this operation

rna_names<-colnames(data$"Gene Expression")
atac_names<-colnames(counts)
intersect<-intersect(atac_names, rna_names)

And I retained just these common elements in the chromatin assay 2) I created a seurat object from the chromatin one and I added the RNA using CreateAssayObject (selecting always the common features)

multiome<- CreateSeuratObject(
  counts = chrom_assay,
  meta.data = md., project ="749",  assay = "ATAC")
multiome[["RNA"]] <- CreateAssayObject(counts = data$"Gene Expression"[ , intersect])

3) At this point we have our multiome assays with the shared peakset. I did quality check, filtering, SCT (on RNA slots), RunTFIDF/ FindTopFeatures/ RunSVD (on each ATAC slot) and at this point I was able to run the code in (https://github.com/timoast/signac/discussions/438.) 4) after integration I followed the paragraphs "Annotating cell types" and "Joint UMAP visualization" here (https://satijalab.org/signac/articles/pbmc_multiomic.html).

In this way I obtained the joint UMAP.

yuhanH commented 2 years ago

hi @tinggithub33 Have a double check. Your data is multi-modal data, right? The cells in the RNA assay and ATAC assay are the same. If so, when you have integrated RNA assay reduction and integrated ATACseq, you can add ATAC assay and integrated ATAC dimensional reduction, integrated_lsi, into your RNA object.

obj.rna[['ATAC']] <- obj.atac[['ATAC']]
obj.rna[['integrated_lsi']] <- obj.atac[['integrated_lsi']]
tinggithub33 commented 2 years ago

@francescasci91 thank you so much for sharing your workflow!

tinggithub33 commented 2 years ago

@yuhanH yes, that is what I need. Thank you very much!

A-legac45 commented 1 year ago

Hello,

I am facing exactly the same problem. I have 3 datasets (ATAC-RNA seq) 10x multiome from same tissu made in distinct experiment which I analyse separately. Now I would like to integrate them. I thought to merge and integrate those 3 datasets by using harmony and as an input my seurat object containing 2/3 assays (ATAC, RNA, "peaks macs2")each. By I cannot merge the rna them. Does some body have the solution to face this problem?

species.combined <- merge(x = first_RNA_atac_MACS2, y = c(CHIMERA1_ssB, CHIMERA2_ssB), merge.data = TRUE, project = c("First_chimera", "chimera1", "chimera2"), add.cell.ids = c("First_chimera", "chimera1", "chimera2"))

Warning: Annotations do not match, keeping annotation from the first object onlyWarning: Annotations do not match, keeping annotation from the first object onlyWarning: Some fragment files are not valid or not indexed. Removing invalid files from merged ChromatinAssay

best,

AAA-3 commented 4 months ago

A follow up question to anyone who used this pipeline:

How consistent were your parametres across ATAC and RNA integration? I treated the integration for my ATAC samples independently to my RNA (different parametres for RNA and different parametres for ATAC (k.anchors etc.)).

However, when I see the WNN vignette, my ATAC's clustering does not seem to be clean...

image

My integration of RNA and ATAC were independently run using and I brought the two together using:

H9[['ATAC']] <- ATAC[['ATAC']]
H9[['integrated_lsi']] <- ATAC[['integrated_lsi']]
H9[['atac_UMAP']] <- ATAC[['umap']]
H9 <- FindMultiModalNeighbors(H9, reduction.list = list("pca", "integrated_lsi"), dims.list = list(1:100, 2:100))
H9 <- RunUMAP(H9, nn.name = "weighted.nn", reduction.name = "wnn.umap", reduction.key = "wnnUMAP_")
H9 <- FindClusters(H9, graph.name = "wsnn", algorithm = 3, verbose = FALSE)