Issues in coembedding the scRNA-seq and scATAC-seq

Hi,

Thanks a lot for developing this very useful package. I met some problems when trying to coembed the scRNA-seq and scATAC-seq data. My code is listed as below:

"pbmc.rna" is a Seurat object of scRNA-seq processed by Seurat package. "pbmc.atac" is a Seurat object of scATAC-seq processed by Signac package. Gene activity matrix has been computed and stored in "RNA" assay. scATAC and scRNA data are from the same person and the same tissue.

After loading both objects

pbmc.rna$tech <- "RNA"
DefaultAssay(pbmc.atac) <- 'RNA'
pbmc.atac$tech <- "ATAC"

label transfer

transfer.anchors <- FindTransferAnchors(
  reference = pbmc.rna, 
  query = pbmc.atac, reduction = "cca"
  )

celltype.predictions <- TransferData(
  anchorset = transfer.anchors,
  refdata = pbmc.rna$CellType,
  weight.reduction = pbmc.atac[['lsi']],
  dims = 2:30
)

pbmc.atac <- AddMetaData(pbmc.atac, metadata = celltype.predictions)
hist(pbmc.atac$prediction.score.max)
abline(v = 0.5, col = "red")

p1 <- DimPlot(pbmc.atac, group.by = "predicted.id", label = TRUE, repel = TRUE) + ggtitle("Label transferred") + 
  NoLegend() + scale_colour_hue(drop = FALSE)
p2 <- DimPlot(pbmc.atac, group.by = "CellType", label = TRUE, repel = TRUE) + ggtitle("Annotated") + 
  NoLegend() + scale_colour_hue(drop = FALSE)
p1 + p2

In pbmc.atac, "CellType" was annotated based on biological knowledge and peak accessibility. "predicted.id" was transferred from scRNA data set. Thus, TransferData() function works very well with the categorical variables.

Compute co-embedding

genes.use <- VariableFeatures(pbmc.rna)
refdata <- GetAssayData(pbmc.rna, assay = "RNA", slot = "data")[genes.use, ]

imputation <- TransferData(anchorset = transfer.anchors, refdata = refdata, 
                           weight.reduction = pbmc.atac[["lsi"]],  dims = 2:30)

pbmc.atac[["RNA"]] <- imputation
coembed <- merge(x = pbmc.rna, y = pbmc.atac)

coembed <- ScaleData(coembed, features = genes.use, do.scale = FALSE)
coembed <- RunPCA(coembed, features = genes.use, verbose = FALSE)
coembed <- RunUMAP(coembed, dims = 1:30)
coembed$NewLabel <- ifelse(!is.na(coembed$predicted.id), coembed$predicted.id, coembed$CellType)

visualization

p1 <- DimPlot(coembed, group.by = "tech")
p2 <- DimPlot(coembed, group.by = "NewLabel")
p1 + p2

scRNA and scATAC data are largely separated in the dimension reduction space, and the formed clusters don't make sense either (3rd figure). In contrast, the predicted cell types by label transfer are largely correct (2nd figure). Are there anything wrong with my code when computing co-embedding? Could you help me to solve this problem? Thank you so much!

satijalab / seurat