stuart-lab / signac

R toolkit for the analysis of single-cell chromatin data
https://stuartlab.org/signac/
Other
331 stars 88 forks source link

Failure to integrate embeddings for more than 2 datasets #923

Closed l-cli closed 2 years ago

l-cli commented 2 years ago

Hi! When trying to integrate 6 datasets following the steps in this vignette https://satijalab.org/signac/articles/integrate_atac.html, we ran into an error saying Error: The cell names in the reduction provided don't match the cell names present in the objects used to build the AnchorSet

We checked the cell names in the merged object (using merge() on the 6 datasets which are stored in obj_lst, see below) and the cell names from the integration anchors object, and found that they were in different formats, so we changed them so that they are the same. However, this error persisted.

Below are our codes and the session information:

data.combined <- merge(obj_lst[[1]], obj_lst[[2]])
for (i in 3:length(obj_lst)){
  data.combined <- merge(data.combined, obj_lst[[i]])
}

# process the combined dataset
data.combined<- FindTopFeatures(data.combined, min.cutoff = 10)
data.combined<- RunTFIDF(data.combined)
data.combined<- RunSVD(data.combined)
data.combined<- RunUMAP(data.combined, reduction = "lsi", dims = 2:30)

# find integration anchors
integration.anchors <- FindIntegrationAnchors(
  object.list = obj_lst,
  anchor.features = 20000,
  reduction = "rlsi",
  normalization.method = "SCT",
  dims = 2:30
)

# data.combined has cell names in the form of GTTACGTAGGAACACA-1_1_1_1
data.combined@assays$peaks@counts@Dimnames[[2]] <- 
  str_sub(data.combined@assays$peaks@counts@Dimnames[[2]], end = -7L)
data.combined@assays$peaks@data@Dimnames[[2]] <- 
  str_sub(data.combined@assays$peaks@data@Dimnames[[2]], end = -7L)
rownames(data.combined@reductions$lsi@cell.embeddings) <- 
  str_sub(rownames(data.combined@reductions$lsi@cell.embeddings), end = -7L)

# objects in integration.anchors have cell names in the form of GTTACGTAGGAACACA-1_1
for (i in 1:length(integration.anchors@object.list)){
  rownames(integration.anchors@object.list[[i]]@reductions$lsi@cell.embeddings) <- 
    str_sub(rownames(integration.anchors@object.list[[i]]@reductions$lsi@cell.embeddings), end = -3L)
  integration.anchors@object.list[[i]]@assays$peaks@counts@Dimnames[[2]] <- 
    str_sub(integration.anchors@object.list[[i]]@assays$peaks@counts@Dimnames[[2]], end = -3L)
  integration.anchors@object.list[[i]]@assays$peaks@data@Dimnames[[2]] <- 
    str_sub(integration.anchors@object.list[[i]]@assays$peaks@data@Dimnames[[2]], end = -3L)
}

# integrate LSI embeddings
integrated <- IntegrateEmbeddings(
  anchorset = integration.anchors,
  reductions = data.combined[["lsi"]],
  new.reduction.name = "integrated_lsi",
  dims.to.integrate = 1:30
)

In the "find integration anchors" step, we also attempted to use a different setting for the argument "anchor.features", namely to use the rownames of one of the objects in the list of objects to be integrated, as below:

integration.anchors <- FindIntegrationAnchors(
  object.list = obj_lst,
  anchor.features = rownames(obj_lst[[1]]),
  reduction = "rlsi",
  normalization.method = "SCT",
  dims = 2:30
)

And proceeded with the same cell name changes as described above, but still got the error.

Any suggestions on why this is? Thank you so much in advance!

timoast commented 2 years ago

You can't rename the cell names in the Seurat and AnchorSet objects by directly modifying the cell names, this can cause all kinds of issues. Instead you need to use the RenameCells() function to rename cells in the Seurat object before generating the AnchorSet object.

I think this issue should be solved by:

  1. Rename cells in each object to add a unique prefix so that there are no name collisions across objects to be merged
  2. Merge the objects and create the un-corrected LSI dimension reduction
  3. Find integration anchors
  4. Use the integration anchors to correct the LSI embeddings
andrewbcaldwell commented 6 months ago

Was there ever a solution for running IntegrateEmbeddings() after FindIntegrationAnchors() with more than two merged Seurat ATAC-seq objects? When following the 4 steps that @timoast suggested, I am able to run IntegrateEmbeddings() when merging two datasets but when I increase to 4 datasets, I get the "Error: The cell names in the reduction provided don't match the cell names present in the objects used to build the AnchorSet" error.