Closed hchintalapudi closed 2 years ago
You likely have an older version of Seurat installed. The lsiproject
and rlsi
options for integration were added in Seurat 4.0.2: https://satijalab.org/seurat/news/index.html#seurat-4-0-2-2020-05-20-2021-05-20
Hi, I just realized that and updated Seurat to 4.0.5, tried the same command again:
> library(Seurat)
> ## Integrate ATAC datasets:
> # find integration anchors
> atac_integration.anchors <- FindIntegrationAnchors(
+ object.list = list(zmel_c, zmel_t),
+ anchor.features = 2000,
+ reduction = "rlsi",
+ dims = 2:30, assay = "peaks"
+ )
Loading required package: Signac
Error in FindIntegrationAnchors(object.list = list(zmel_c, zmel_t), anchor.features = 2000, :
If specifying the assay, please specify one assay per object in the object.list
> ## Integrate ATAC datasets:
> # find integration anchors
> atac_integration.anchors <- FindIntegrationAnchors(
+ object.list = list(zmel_c[["peaks"]], zmel_t[["peaks"]]),
+ anchor.features = 2000,
+ reduction = "rlsi",
+ dims = 2:30
+ )
Error in slot(object = `*tmp*`, name = "tools") :
no slot of name "tools" for this object of class "ChromatinAssay"
Don't quite understand what I did wrong here.
The assay
argument should be a list of assay names to use, see the docs here: https://satijalab.org/seurat/reference/findintegrationanchors
Hello @timoast, Sorry to bother again! I wanted to ask a question regarding the integration process:
I have two multiome datasets/samples coming from different melanoma cell lines (cultured and transplanted) and I was trying the integration process as I described above when I opened the issue. Now, the RNA integration worked fine, I see both the clusters from cultured and transplanted converging on the UMAP. However, this isn't the case with ATAC assay. Here's my code after creating the ChromatinAssay objects:
> zmel_c
An object of class Seurat
401783 features across 9691 samples within 3 assays
Active assay: ATAC (170944 features, 170944 variable features)
2 other assays present: RNA, peaks
1 dimensional reduction calculated: lsi
> zmel_t
An object of class Seurat
275296 features across 5762 samples within 3 assays
Active assay: ATAC (120326 features, 120112 variable features)
2 other assays present: RNA, peaks
1 dimensional reduction calculated: lsi
# merge
zmel.combined <- merge(zmel_c, zmel_t)
# process the combined dataset
zmel.combined <- FindTopFeatures(zmel.combined, min.cutoff = 10)
zmel.combined <- RunTFIDF(zmel.combined)
zmel.combined <- RunSVD(zmel.combined)
zmel.combined <- RunUMAP(zmel.combined, reduction = "lsi", dims = 2:30)
p1 <- DimPlot(zmel.combined, group.by = "type")
## Integrate ATAC datasets:
# find integration anchors
atac_integration.anchors <- FindIntegrationAnchors(
object.list = list(zmel_c, zmel_t),
anchor.features = 2000,
reduction = "rlsi",
dims = 2:30
)
Computing within dataset neighborhoods
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=04s
Finding all pairwise anchors
| | 0 % ~calculating Warning: No filtering performed if passing to data rather than counts
Projecting new data onto SVD
Projecting new data onto SVD
Finding neighborhoods
Finding anchors
Found 23 anchors
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=15s
Warning message:
In CheckDuplicateCellNames(object.list = object.list) :
Some cell names are duplicated across objects provided. Renaming to enforce unique cell names.
# integrate LSI embeddings
integrated <- IntegrateEmbeddings(
anchorset = atac_integration.anchors,
[tmp_ZMEL_merged.vs.integrated.pdf](https://github.com/timoast/signac/files/7418473/tmp_ZMEL_merged.vs.integrated.pdf)
reductions = zmel.combined[["lsi"]],
new.reduction.name = "integrated_lsi",
dims.to.integrate = 1:30, k.weight = 20
)
# create a new UMAP using the integrated embeddings
integrated <- RunUMAP(integrated, reduction = "integrated_lsi", dims = 2:30)
p2 <- DimPlot(integrated, group.by = "type")
p1 | p2
Attached is the UMAP. I found very few anchors and the clusters are the same as merged data. tmp_ZMEL_merged.vs.integrated.pdf
Is my integration logic wrong here?
For the IntegrateEmbeddings() function, it failed when the 'k.weight' param was at a default of 100, I tried and reduced it in a stepwise manner and it finally worked for a very low value of 20.
Any insights into this would be so much appreciated, thanks!
Did you quantify the same peaks in both datasets?
Another option is to use the anchors you found using the RNA assay to integrate the LSI embeddings
I called peaks using:
peaks_zmel_c <- CallPeaks(zmel_c, macs2.path = "/Users/hchintalapudi/miniconda3/envs/himanshu/bin/macs2", effective.genome.size = 1373471384)
# remove peaks on nonstandard chromosomes and in genomic blacklist regions
peaks_zmel_c <- keepStandardChromosomes(peaks_zmel_c, pruning.mode = "coarse")
# quantify counts in each peak
macs2_counts_zmel_c <- FeatureMatrix(
fragments = Fragments(zmel_c),
features = peaks_zmel_c,
cells = colnames(zmel_c)
)
# create a new assay using the MACS2 peak set and add it to the Seurat object
zmel_c[["peaks"]] <- CreateChromatinAssay(
counts = macs2_counts_zmel_c,
fragments = fragpath.1,
annotation = annotations
)
Similar for the other sample.
I see that the "peaks" assay is not used in the integration vignette, so I tried to use it by changing DefaultAssay() from "ATAC" to "peaks" and see if it changes things but I found no anchors. Should I use UnifyPeaks() ? Last time I used Signac was when the latest version was 0.2.5 and it looks like things changed and I'm a little confused.
I will try the last option you suggested now. Thanks.
See the merge vignette for an example of how to get the same peaks quantified in both datasets: https://satijalab.org/signac/articles/merging.html
Hi, Thanks for the suggestions. I quantified the common peaks using the merge vignette, did some QC and proceeded with the integration and my integration anchors increased from 23 from previous run to 951. However, I see significant separation in the integrated UMAP clusters. I removed the QC steps to see if the change in anchors would do something, reduced the dims, changed the k.weight value but nothing changed anything significantly. There is a clear batch effect which I can see from the merged UMAP but it looks like the integration did not work so well.
Here is the code:
plan("multiprocess", workers = 4)
options(future.globals.maxSize = 50000 * 1024^2) # for 50 Gb RAM
# read in peak sets
peaks_zmel_t.1 <- read.table(
file = "/Users/hchintalapudi/Desktop/work/RO1/ZMEL_transplant/atac_peaks.bed",
col.names = c("chr", "start", "end")
)
peaks_zmel_c.1 <- read.table(
file = "/Users/hchintalapudi/Desktop/work/RO1/ZMEL_cultured/atac_peaks.bed",
col.names = c("chr", "start", "end")
)
# convert to genomic ranges
gr.zmel_t <- makeGRangesFromDataFrame(peaks_zmel_t.1)
gr.zmel_c <- makeGRangesFromDataFrame(peaks_zmel_c.1)
# Create a unified set of peaks to quantify in each dataset
combined.peaks <- reduce(x = c(gr.zmel_c, gr.zmel_t))
# Filter out bad peaks based on length
peakwidths <- width(combined.peaks)
combined.peaks <- combined.peaks[peakwidths < 10000 & peakwidths > 20]
combined.peaks
# load metadata
md.zmel_t <- read.table(
file = "/Users/hchintalapudi/Desktop/work/RO1/ZMEL_transplant/per_barcode_metrics.csv",
stringsAsFactors = FALSE,
sep = ",",
header = TRUE,
row.names = 1
)[-1, ] # remove the first row
md.zmel_c <- read.table(
file = "/Users/hchintalapudi/Desktop/work/RO1/ZMEL_cultured/per_barcode_metrics.csv",
stringsAsFactors = FALSE,
sep = ",",
header = TRUE,
row.names = 1
)[-1, ] # remove the first row
# perform an initial filtering of low count cells
# ** use "atac_fragments" for multiome as 'passed_filters' is not present
md.zmel_t <- md.zmel_t[md.zmel_t$atac_fragments>200, ]
md.zmel_c <- md.zmel_c[md.zmel_c$atac_fragments>200, ]
# create fragment objects
frags.zmel_t <- CreateFragmentObject(
path = "/Users/hchintalapudi/Desktop/work/RO1/ZMEL_transplant/atac_fragments.tsv.gz",
cells = rownames(md.zmel_t)
)
frags.zmel_c <- CreateFragmentObject(
path = "/Users/hchintalapudi/Desktop/work/RO1/ZMEL_cultured/atac_fragments.tsv.gz",
cells = rownames(md.zmel_c),
verbose = T
)
counts_zmel_t <- FeatureMatrix(
fragments = frags.zmel_t,
features = combined.peaks,
cells = rownames(md.zmel_t)
)
counts_zmel_c <- FeatureMatrix(
fragments = frags.zmel_c,
features = combined.peaks,
cells = rownames(md.zmel_c)
)
zmel_t_assay <- CreateChromatinAssay(counts_zmel_t, fragments = frags.zmel_t, annotation = annotations)
zmel_t. <- CreateSeuratObject(zmel_t_assay, assay = "ATAC", meta.data=md.zmel_t)
zmel_c_assay <- CreateChromatinAssay(counts_zmel_c, fragments = frags.zmel_c, annotation = annotations)
zmel_c. <- CreateSeuratObject(zmel_c_assay, assay = "ATAC", meta.data=md.zmel_c)
# add information to identify dataset of origin
zmel_c.$type <- 'cultured'
zmel_t.$type <- 'transplant'
zmel_c. <- NucleosomeSignal(zmel_c.)
zmel_c. <- TSSEnrichment(zmel_c.)
zmel_c. <- subset(
x = zmel_c.,
subset = nCount_ATAC < 100000 &
nCount_ATAC > 1000 &
nucleosome_signal < 2 &
TSS.enrichment > 2
)
zmel_c.
zmel_c.<- RunTFIDF(zmel_c.)
zmel_c. <- FindTopFeatures(zmel_c., min.cutoff = 20)
zmel_c. <- RunSVD(zmel_c.)
zmel_c. <- RunUMAP(zmel_c., dims = 2:50, reduction = 'lsi')
zmel_t. <- NucleosomeSignal(zmel_t.)
zmel_t. <- TSSEnrichment(zmel_t.)
zmel_t. <- subset(
x = zmel_t.,
subset = nCount_ATAC < 100000 &
nCount_ATAC > 1000 &
nucleosome_signal < 2 &
TSS.enrichment > 2)
zmel_t.
zmel_t.<- RunTFIDF(zmel_t.)
zmel_t. <- FindTopFeatures(zmel_t., min.cutoff = 20)
zmel_t. <- RunSVD(zmel_t.)
zmel_t. <- RunUMAP(zmel_t., dims = 2:50, reduction = 'lsi')
# merge all datasets, adding a cell ID to make sure cell names are unique
combined_merged <- merge(
x = zmel_c.,
y = zmel_t.,
)
combined_merged[["ATAC"]]
combined_merged <- RunTFIDF(combined_merged)
combined_merged <- FindTopFeatures(combined_merged, min.cutoff = 20)
combined_merged <- RunSVD(combined_merged)
combined_merged <- RunUMAP(combined_merged, dims = 2:50, reduction = 'lsi')
p1<-DimPlot(combined_merged, group.by = 'type', pt.size = 0.1)
atac_integration.anchors <- FindIntegrationAnchors(
object.list = list(zmel_c., zmel_t.),
anchor.features = rownames(combined_merged),
assay = c('ATAC','ATAC'),
reduction = "rlsi",
dims = 2:30
)
# integrate LSI embeddings
integrated <- IntegrateEmbeddings(
anchorset = atac_integration.anchors,
reductions = combined_merged[["lsi"]],
new.reduction.name = "integrated_lsi",
dims.to.integrate = 1:30
)
# create a new UMAP using the integrated embeddings
integrated <- RunUMAP(integrated, reduction = "integrated_lsi", dims = 2:30)
p2 <- DimPlot(integrated, group.by = "type")
p2
p1 | p2
Attaching the final plot.
tmp_ZMEL-merged.vs.integrated.pdf Do you have any advice on how to improve things with the integration?
Thanks again!
Hi, I have multiomics (RNA + ATAC) data for two samples and I want to use WNN analysis described here. My method was to Integrate RNA assays of 2 samples, Integrate ATAC assays of the two samples and then use WNN FindMultiModalNeighbors() function. However, when I try to integrate my ATAC data, I get this error:
Here is my full code:
Any tips appreciated, thanks!