ScATAC integration error: Error in match.arg(arg = reduction) : 'arg' should be one of "cca", "rpca"

hchintalapudi commented 2 years ago

Hi, I have multiomics (RNA + ATAC) data for two samples and I want to use WNN analysis described here. My method was to Integrate RNA assays of 2 samples, Integrate ATAC assays of the two samples and then use WNN FindMultiModalNeighbors() function. However, when I try to integrate my ATAC data, I get this error:

Error in match.arg(arg = reduction) : 
  'arg' should be one of "cca", "rpca"

Here is my full code:

DefaultAssay(zmel_c_old) <- "ATAC"

zmel_c_old <- NucleosomeSignal(zmel_c_old)
zmel_c_old <- TSSEnrichment(zmel_c_old)

zmel_c<- zmel_c_old

# filter out low quality cells
zmel_c <- subset(
  x = zmel_c,
  subset = nCount_ATAC < 100000 &
    nFeature_RNA< 7500 &
    percent.mt< 10 &
    nCount_RNA < 100000 &
    nCount_ATAC > 1000 &
    nucleosome_signal < 2 &
    TSS.enrichment > 2
)

# call peaks using MACS2
peaks_zmel_c <- CallPeaks(zmel_c, macs2.path = "/Users/hchintalapudi/miniconda3/envs/himanshu/bin/macs2", effective.genome.size = 1373471384)

# remove peaks on nonstandard chromosomes and in genomic blacklist regions
peaks_zmel_c <- keepStandardChromosomes(peaks_zmel_c, pruning.mode = "coarse")
#peaks_zmel <- subsetByOverlaps(x = peaks, ranges = blacklist_hg38_unified, invert = TRUE)

# quantify counts in each peak
macs2_counts_zmel_c <- FeatureMatrix(
  fragments = Fragments(zmel_c),
  features = peaks_zmel_c,
  cells = colnames(zmel_c)
)

# create a new assay using the MACS2 peak set and add it to the Seurat object
zmel_c[["peaks"]] <- CreateChromatinAssay(
  counts = macs2_counts_zmel_c,
  fragments = fragpath.1,
  annotation = annotations
)
# compute LSI
zmel_c <- FindTopFeatures(zmel_c, min.cutoff = 10)
zmel_c <- RunTFIDF(zmel_c)
zmel_c <- RunSVD(zmel_c)

##-- 
# ZMEL transplant:
DefaultAssay(zmel_t_old) <- "ATAC"

zmel_t_old <- NucleosomeSignal(zmel_t_old)
zmel_t_old <- TSSEnrichment(zmel_t_old)

zmel_t<- zmel_t_old

# filter out low quality cells

zmel_t <- subset(
  x = zmel_t,
  subset = nCount_ATAC < 100000 &
    percent.mt< 10 &
    nCount_RNA < 100000 &
    nCount_ATAC > 1000 &
    nucleosome_signal < 2 &
    TSS.enrichment > 2 &
    nFeature_RNA< 7500 & 
    percent.mt< 10
)

# call peaks using MACS2
peaks_zmel_t <- CallPeaks(zmel_t, macs2.path = "/Users/hchintalapudi/miniconda3/envs/himanshu/bin/macs2", effective.genome.size = 1373471384)

# remove peaks on nonstandard chromosomes and in genomic blacklist regions
peaks_zmel_t <- keepStandardChromosomes(peaks_zmel_t, pruning.mode = "coarse")
#peaks_zmel <- subsetByOverlaps(x = peaks, ranges = blacklist_hg38_unified, invert = TRUE)

# quantify counts in each peak
macs2_counts_zmel_t <- FeatureMatrix(
  fragments = Fragments(zmel_t),
  features = peaks_zmel_t,
  cells = colnames(zmel_t)
)

# create a new assay using the MACS2 peak set and add it to the Seurat object
zmel_t[["peaks"]] <- CreateChromatinAssay(
  counts = macs2_counts_zmel_t,
  fragments = fragpath.2,
  annotation = annotations
)

# compute LSI
zmel_t <- FindTopFeatures(zmel_t, min.cutoff = 10)
zmel_t <- RunTFIDF(zmel_t)
zmel_t <- RunSVD(zmel_t)

## Integrate ATAC datasets:
# find integration anchors
atac_integration.anchors <- FindIntegrationAnchors(
  object.list = list(zmel_c, zmel_t),
  anchor.features = 2000,
  reduction = "rlsi",
  dims = 2:30
)

Any tips appreciated, thanks!

timoast commented 2 years ago

You likely have an older version of Seurat installed. The lsiproject and rlsi options for integration were added in Seurat 4.0.2: https://satijalab.org/seurat/news/index.html#seurat-4-0-2-2020-05-20-2021-05-20

hchintalapudi commented 2 years ago

Hi, I just realized that and updated Seurat to 4.0.5, tried the same command again:

> library(Seurat)
> ## Integrate ATAC datasets:
> # find integration anchors
> atac_integration.anchors <- FindIntegrationAnchors(
+   object.list = list(zmel_c, zmel_t),
+   anchor.features = 2000,
+   reduction = "rlsi",
+   dims = 2:30, assay = "peaks"
+ )
Loading required package: Signac
Error in FindIntegrationAnchors(object.list = list(zmel_c, zmel_t), anchor.features = 2000,  : 
  If specifying the assay, please specify one assay per object in the object.list

> ## Integrate ATAC datasets:
> # find integration anchors
> atac_integration.anchors <- FindIntegrationAnchors(
+   object.list = list(zmel_c[["peaks"]], zmel_t[["peaks"]]),
+   anchor.features = 2000,
+   reduction = "rlsi",
+   dims = 2:30
+ )
Error in slot(object = `*tmp*`, name = "tools") : 
  no slot of name "tools" for this object of class "ChromatinAssay"

Don't quite understand what I did wrong here.

timoast commented 2 years ago

The assay argument should be a list of assay names to use, see the docs here: https://satijalab.org/seurat/reference/findintegrationanchors

hchintalapudi commented 2 years ago

Hello @timoast, Sorry to bother again! I wanted to ask a question regarding the integration process:

I have two multiome datasets/samples coming from different melanoma cell lines (cultured and transplanted) and I was trying the integration process as I described above when I opened the issue. Now, the RNA integration worked fine, I see both the clusters from cultured and transplanted converging on the UMAP. However, this isn't the case with ATAC assay. Here's my code after creating the ChromatinAssay objects:

> zmel_c
An object of class Seurat 
401783 features across 9691 samples within 3 assays 
Active assay: ATAC (170944 features, 170944 variable features)
 2 other assays present: RNA, peaks
 1 dimensional reduction calculated: lsi
> zmel_t
An object of class Seurat 
275296 features across 5762 samples within 3 assays 
Active assay: ATAC (120326 features, 120112 variable features)
 2 other assays present: RNA, peaks
 1 dimensional reduction calculated: lsi

# merge
zmel.combined <- merge(zmel_c, zmel_t)

# process the combined dataset
zmel.combined <- FindTopFeatures(zmel.combined, min.cutoff = 10)
zmel.combined <- RunTFIDF(zmel.combined)
zmel.combined <- RunSVD(zmel.combined)
zmel.combined <- RunUMAP(zmel.combined, reduction = "lsi", dims = 2:30)
p1 <- DimPlot(zmel.combined, group.by = "type")

## Integrate ATAC datasets:
# find integration anchors
atac_integration.anchors <- FindIntegrationAnchors(
  object.list = list(zmel_c, zmel_t),
  anchor.features = 2000,
  reduction = "rlsi",
  dims = 2:30
)

Computing within dataset neighborhoods
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=04s  
Finding all pairwise anchors
  |                                                  | 0 % ~calculating  Warning: No filtering performed if passing to data rather than counts
Projecting new data onto SVD
Projecting new data onto SVD
Finding neighborhoods
Finding anchors
    Found 23 anchors
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=15s  
Warning message:
In CheckDuplicateCellNames(object.list = object.list) :
  Some cell names are duplicated across objects provided. Renaming to enforce unique cell names.

# integrate LSI embeddings
integrated <- IntegrateEmbeddings(
  anchorset = atac_integration.anchors,
[tmp_ZMEL_merged.vs.integrated.pdf](https://github.com/timoast/signac/files/7418473/tmp_ZMEL_merged.vs.integrated.pdf)
  reductions = zmel.combined[["lsi"]],
  new.reduction.name = "integrated_lsi",
  dims.to.integrate = 1:30, k.weight = 20
)

# create a new UMAP using the integrated embeddings
integrated <- RunUMAP(integrated, reduction = "integrated_lsi", dims = 2:30)
p2 <- DimPlot(integrated, group.by = "type")

p1 | p2

Attached is the UMAP. I found very few anchors and the clusters are the same as merged data. tmp_ZMEL_merged.vs.integrated.pdf

Is my integration logic wrong here? For the IntegrateEmbeddings() function, it failed when the 'k.weight' param was at a default of 100, I tried and reduced it in a stepwise manner and it finally worked for a very low value of 20.
Any insights into this would be so much appreciated, thanks!

timoast commented 2 years ago

Did you quantify the same peaks in both datasets?

Another option is to use the anchors you found using the RNA assay to integrate the LSI embeddings

hchintalapudi commented 2 years ago

I called peaks using:

peaks_zmel_c <- CallPeaks(zmel_c, macs2.path = "/Users/hchintalapudi/miniconda3/envs/himanshu/bin/macs2", effective.genome.size = 1373471384)
# remove peaks on nonstandard chromosomes and in genomic blacklist regions
peaks_zmel_c <- keepStandardChromosomes(peaks_zmel_c, pruning.mode = "coarse")

# quantify counts in each peak
macs2_counts_zmel_c <- FeatureMatrix(
  fragments = Fragments(zmel_c),
  features = peaks_zmel_c,
  cells = colnames(zmel_c)
)

# create a new assay using the MACS2 peak set and add it to the Seurat object
zmel_c[["peaks"]] <- CreateChromatinAssay(
  counts = macs2_counts_zmel_c,
  fragments = fragpath.1,
  annotation = annotations
)
Similar for the other sample.

I see that the "peaks" assay is not used in the integration vignette, so I tried to use it by changing DefaultAssay() from "ATAC" to "peaks" and see if it changes things but I found no anchors. Should I use UnifyPeaks() ? Last time I used Signac was when the latest version was 0.2.5 and it looks like things changed and I'm a little confused.

I will try the last option you suggested now. Thanks.

timoast commented 2 years ago

See the merge vignette for an example of how to get the same peaks quantified in both datasets: https://satijalab.org/signac/articles/merging.html

hchintalapudi commented 2 years ago

Hi, Thanks for the suggestions. I quantified the common peaks using the merge vignette, did some QC and proceeded with the integration and my integration anchors increased from 23 from previous run to 951. However, I see significant separation in the integrated UMAP clusters. I removed the QC steps to see if the change in anchors would do something, reduced the dims, changed the k.weight value but nothing changed anything significantly. There is a clear batch effect which I can see from the merged UMAP but it looks like the integration did not work so well.

Here is the code:

plan("multiprocess", workers = 4)
options(future.globals.maxSize = 50000 * 1024^2) # for 50 Gb RAM

# read in peak sets
peaks_zmel_t.1 <- read.table(
  file = "/Users/hchintalapudi/Desktop/work/RO1/ZMEL_transplant/atac_peaks.bed",
  col.names = c("chr", "start", "end")
)
peaks_zmel_c.1 <- read.table(
  file = "/Users/hchintalapudi/Desktop/work/RO1/ZMEL_cultured/atac_peaks.bed",
  col.names = c("chr", "start", "end")
)
# convert to genomic ranges
gr.zmel_t <- makeGRangesFromDataFrame(peaks_zmel_t.1)
gr.zmel_c <- makeGRangesFromDataFrame(peaks_zmel_c.1)

# Create a unified set of peaks to quantify in each dataset
combined.peaks <- reduce(x = c(gr.zmel_c, gr.zmel_t))

# Filter out bad peaks based on length
peakwidths <- width(combined.peaks)
combined.peaks <- combined.peaks[peakwidths  < 10000 & peakwidths > 20]
combined.peaks

# load metadata
md.zmel_t <- read.table(
  file = "/Users/hchintalapudi/Desktop/work/RO1/ZMEL_transplant/per_barcode_metrics.csv",
  stringsAsFactors = FALSE,
  sep = ",",
  header = TRUE,
  row.names = 1
)[-1, ] # remove the first row

md.zmel_c <- read.table(
  file = "/Users/hchintalapudi/Desktop/work/RO1/ZMEL_cultured/per_barcode_metrics.csv",
  stringsAsFactors = FALSE,
  sep = ",",
  header = TRUE,
  row.names = 1
)[-1, ] # remove the first row

# perform an initial filtering of low count cells
# ** use "atac_fragments" for multiome as 'passed_filters' is not present
md.zmel_t <- md.zmel_t[md.zmel_t$atac_fragments>200, ]
md.zmel_c <- md.zmel_c[md.zmel_c$atac_fragments>200, ]

# create fragment objects
frags.zmel_t <- CreateFragmentObject(
  path = "/Users/hchintalapudi/Desktop/work/RO1/ZMEL_transplant/atac_fragments.tsv.gz",
  cells = rownames(md.zmel_t)
)
frags.zmel_c <- CreateFragmentObject(
  path = "/Users/hchintalapudi/Desktop/work/RO1/ZMEL_cultured/atac_fragments.tsv.gz",
  cells = rownames(md.zmel_c),
  verbose = T
)

counts_zmel_t <- FeatureMatrix(
  fragments = frags.zmel_t,
  features = combined.peaks,
  cells = rownames(md.zmel_t)
)

counts_zmel_c <- FeatureMatrix(
  fragments = frags.zmel_c,
  features = combined.peaks,
  cells = rownames(md.zmel_c)
)

zmel_t_assay <- CreateChromatinAssay(counts_zmel_t, fragments = frags.zmel_t, annotation = annotations)
zmel_t. <- CreateSeuratObject(zmel_t_assay, assay = "ATAC", meta.data=md.zmel_t)

zmel_c_assay <- CreateChromatinAssay(counts_zmel_c, fragments = frags.zmel_c, annotation = annotations)
zmel_c. <- CreateSeuratObject(zmel_c_assay, assay = "ATAC", meta.data=md.zmel_c)

# add information to identify dataset of origin
zmel_c.$type <- 'cultured'
zmel_t.$type <- 'transplant'

zmel_c. <- NucleosomeSignal(zmel_c.)
zmel_c. <- TSSEnrichment(zmel_c.)

zmel_c. <- subset(
  x = zmel_c.,
  subset = nCount_ATAC < 100000 &
    nCount_ATAC > 1000 &
    nucleosome_signal < 2 &
    TSS.enrichment > 2
)
zmel_c.

zmel_c.<- RunTFIDF(zmel_c.)
zmel_c. <- FindTopFeatures(zmel_c., min.cutoff = 20)
zmel_c. <- RunSVD(zmel_c.)
zmel_c. <- RunUMAP(zmel_c., dims = 2:50, reduction = 'lsi')

zmel_t. <- NucleosomeSignal(zmel_t.)
zmel_t. <- TSSEnrichment(zmel_t.)

zmel_t. <- subset(
  x = zmel_t.,
  subset = nCount_ATAC < 100000 &
    nCount_ATAC > 1000 &
    nucleosome_signal < 2 &
    TSS.enrichment > 2)
zmel_t.

zmel_t.<- RunTFIDF(zmel_t.)
zmel_t. <- FindTopFeatures(zmel_t., min.cutoff = 20)
zmel_t. <- RunSVD(zmel_t.)
zmel_t. <- RunUMAP(zmel_t., dims = 2:50, reduction = 'lsi')

# merge all datasets, adding a cell ID to make sure cell names are unique
combined_merged <- merge(
  x = zmel_c.,
  y = zmel_t.,
)
combined_merged[["ATAC"]]

combined_merged <- RunTFIDF(combined_merged)
combined_merged <- FindTopFeatures(combined_merged, min.cutoff = 20)
combined_merged <- RunSVD(combined_merged)
combined_merged <- RunUMAP(combined_merged, dims = 2:50, reduction = 'lsi')

p1<-DimPlot(combined_merged, group.by = 'type', pt.size = 0.1)

atac_integration.anchors <- FindIntegrationAnchors(
  object.list = list(zmel_c., zmel_t.),
  anchor.features = rownames(combined_merged),
  assay = c('ATAC','ATAC'),
  reduction = "rlsi",
  dims = 2:30
)

# integrate LSI embeddings
integrated <- IntegrateEmbeddings(
  anchorset = atac_integration.anchors,
  reductions = combined_merged[["lsi"]],
  new.reduction.name = "integrated_lsi",
  dims.to.integrate = 1:30
)

# create a new UMAP using the integrated embeddings
integrated <- RunUMAP(integrated, reduction = "integrated_lsi", dims = 2:30)
p2 <- DimPlot(integrated, group.by = "type")
p2
p1 | p2

Attaching the final plot.

tmp_ZMEL-merged.vs.integrated.pdf Do you have any advice on how to improve things with the integration?

Thanks again!

stuart-lab / signac

ScATAC integration error: Error in match.arg(arg = reduction) : 'arg' should be one of "cca", "rpca" #847