stuart-lab / signac

R toolkit for the analysis of single-cell chromatin data
https://stuartlab.org/signac/
Other
330 stars 88 forks source link

How sequencing depth impacts the number of DARs? #1789

Open sshanyiiii opened 1 month ago

sshanyiiii commented 1 month ago

Hi,

I am working with Signac to find DARs for each cell type under two experimental conditions. I have 6 samples(2 experimental conditions and 3 time points) constructed by a 10X multiome GEX+ATAC library. I merged 6 samples and defined 13 cell types.

peaks <- CallPeaks(Seurat_obj, group.by = "Celltype")
peaks <- keepStandardChromosomes(peaks, pruning.mode = "coarse")
peaks <- subsetByOverlaps(x = peaks, ranges = blacklist_mm10, invert = TRUE)
macs2_counts <- FeatureMatrix(
  fragments = Fragments(Seurat_obj),
  features = peaks,
  cells = colnames(Seurat_obj))
Seurat_obj[["peaks"]] <- CreateChromatinAssay(
  counts = macs2_counts,
  fragments = Fragments(Seurat_obj),
  annotation = annotation)

The sequencing depths of the 6 samples differ greatly.

seq_depth <- tapply(Seurat_obj$nCount_peaks, Seurat_obj$orig.ident, mean)
barplot(seq_depth, main="Average ATAC Reads per Cell by Sample", las = 2)

图片2

When I used FindMarkers to find DARs between two conditions at 3-time points, I noticed that the number of DARs is highly related to sequencing depth. Also, the TSS enrichment score of samples is consistent with sequencing depth. The same trend holds for DARs between two conditions across cell types. 图片1 图片3

As you mentioned the differential accessibility test uses the TF-IDF values which incorporate a per-cell depth normalization step https://github.com/stuart-lab/signac/issues/373 and I use 'nCount_peaks' as a covariate with FindMarkers function. However the number of DARs still highly related to sequencing depth, so I want to ask why this happens and what can I do with sequencing depth?

DefaultAssay(Seurat_obj) <- "peaks"
Seurat_obj <- RunTFIDF(Seurat_obj, assay = "peaks")      # TF-IDF normalization
Idents(Seurat_obj) <- Seurat_obj$orig.ident
time <- c("125", "145", "185")
NT_pst_time_DiffPeak <- list()
WT_pst_time_DiffPeak <- list()

for(i in 1:length(time)){
  ident.1 <- paste0("NT", time[i])
  ident.2 <- paste0("WT", time[i])
  da_peaks <- FindMarkers(
    object = Seurat_obj,
    ident.1 = ident.1,
    ident.2 = ident.2,
    only.pos = TRUE,
    test.use = 'LR',
    min.pct = 0.05,
    latent.vars = 'nCount_peaks')
  NT_pst_time_DiffPeak[[time[i]]] <- da_peaks
}
for(i in 1:length(time)){
  ident.1 <- paste0("WT", time[i])
  ident.2 <- paste0("NT", time[i])
  da_peaks <- FindMarkers(
    object = Seurat_obj,
    ident.1 = ident.1,
    ident.2 = ident.2,
    only.pos = TRUE,
    test.use = 'LR',
    min.pct = 0.05,
    latent.vars = 'nCount_peaks')
  WT_pst_time_DiffPeak[[time[i]]] <- da_peaks
}