satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.26k stars 909 forks source link

Issues in coembedding the scRNA-seq and scATAC-seq #3475

Closed chent5 closed 4 years ago

chent5 commented 4 years ago

Hi,

Thanks a lot for developing this very useful package. I met some problems when trying to coembed the scRNA-seq and scATAC-seq data. My code is listed as below:

"pbmc.rna" is a Seurat object of scRNA-seq processed by Seurat package. "pbmc.atac" is a Seurat object of scATAC-seq processed by Signac package. Gene activity matrix has been computed and stored in "RNA" assay. scATAC and scRNA data are from the same person and the same tissue.

After loading both objects

pbmc.rna$tech <- "RNA"
DefaultAssay(pbmc.atac) <- 'RNA'
pbmc.atac$tech <- "ATAC"

label transfer

transfer.anchors <- FindTransferAnchors(
  reference = pbmc.rna, 
  query = pbmc.atac, reduction = "cca"
  )

celltype.predictions <- TransferData(
  anchorset = transfer.anchors,
  refdata = pbmc.rna$CellType,
  weight.reduction = pbmc.atac[['lsi']],
  dims = 2:30
)

pbmc.atac <- AddMetaData(pbmc.atac, metadata = celltype.predictions)
hist(pbmc.atac$prediction.score.max)
abline(v = 0.5, col = "red")
image
p1 <- DimPlot(pbmc.atac, group.by = "predicted.id", label = TRUE, repel = TRUE) + ggtitle("Label transferred") + 
  NoLegend() + scale_colour_hue(drop = FALSE)
p2 <- DimPlot(pbmc.atac, group.by = "CellType", label = TRUE, repel = TRUE) + ggtitle("Annotated") + 
  NoLegend() + scale_colour_hue(drop = FALSE)
p1 + p2
image

In pbmc.atac, "CellType" was annotated based on biological knowledge and peak accessibility. "predicted.id" was transferred from scRNA data set. Thus, TransferData() function works very well with the categorical variables.

Compute co-embedding

genes.use <- VariableFeatures(pbmc.rna)
refdata <- GetAssayData(pbmc.rna, assay = "RNA", slot = "data")[genes.use, ]

imputation <- TransferData(anchorset = transfer.anchors, refdata = refdata, 
                           weight.reduction = pbmc.atac[["lsi"]],  dims = 2:30)

pbmc.atac[["RNA"]] <- imputation
coembed <- merge(x = pbmc.rna, y = pbmc.atac)

coembed <- ScaleData(coembed, features = genes.use, do.scale = FALSE)
coembed <- RunPCA(coembed, features = genes.use, verbose = FALSE)
coembed <- RunUMAP(coembed, dims = 1:30)
coembed$NewLabel <- ifelse(!is.na(coembed$predicted.id), coembed$predicted.id, coembed$CellType)

visualization

p1 <- DimPlot(coembed, group.by = "tech")
p2 <- DimPlot(coembed, group.by = "NewLabel")
p1 + p2

image

scRNA and scATAC data are largely separated in the dimension reduction space, and the formed clusters don't make sense either (3rd figure). In contrast, the predicted cell types by label transfer are largely correct (2nd figure). Are there anything wrong with my code when computing co-embedding? Could you help me to solve this problem? Thank you so much!

timoast commented 4 years ago

Addressed here: https://github.com/timoast/signac/issues/219