How sequencing depth impacts the number of DARs?

Hi,

I am working with Signac to find DARs for each cell type under two experimental conditions. I have 6 samples(2 experimental conditions and 3 time points) constructed by a 10X multiome GEX+ATAC library. I merged 6 samples and defined 13 cell types.

peaks <- CallPeaks(Seurat_obj, group.by = "Celltype")
peaks <- keepStandardChromosomes(peaks, pruning.mode = "coarse")
peaks <- subsetByOverlaps(x = peaks, ranges = blacklist_mm10, invert = TRUE)
macs2_counts <- FeatureMatrix(
  fragments = Fragments(Seurat_obj),
  features = peaks,
  cells = colnames(Seurat_obj))
Seurat_obj[["peaks"]] <- CreateChromatinAssay(
  counts = macs2_counts,
  fragments = Fragments(Seurat_obj),
  annotation = annotation)

The sequencing depths of the 6 samples differ greatly.

seq_depth <- tapply(Seurat_obj$nCount_peaks, Seurat_obj$orig.ident, mean)
barplot(seq_depth, main="Average ATAC Reads per Cell by Sample", las = 2)

When I used FindMarkers to find DARs between two conditions at 3-time points, I noticed that the number of DARs is highly related to sequencing depth. Also, the TSS enrichment score of samples is consistent with sequencing depth. The same trend holds for DARs between two conditions across cell types.

As you mentioned the differential accessibility test uses the TF-IDF values which incorporate a per-cell depth normalization step https://github.com/stuart-lab/signac/issues/373 and I use 'nCount_peaks' as a covariate with FindMarkers function. However the number of DARs still highly related to sequencing depth, so I want to ask why this happens and what can I do with sequencing depth?

DefaultAssay(Seurat_obj) <- "peaks"
Seurat_obj <- RunTFIDF(Seurat_obj, assay = "peaks")      # TF-IDF normalization
Idents(Seurat_obj) <- Seurat_obj$orig.ident
time <- c("125", "145", "185")
NT_pst_time_DiffPeak <- list()
WT_pst_time_DiffPeak <- list()

for(i in 1:length(time)){
  ident.1 <- paste0("NT", time[i])
  ident.2 <- paste0("WT", time[i])
  da_peaks <- FindMarkers(
    object = Seurat_obj,
    ident.1 = ident.1,
    ident.2 = ident.2,
    only.pos = TRUE,
    test.use = 'LR',
    min.pct = 0.05,
    latent.vars = 'nCount_peaks')
  NT_pst_time_DiffPeak[[time[i]]] <- da_peaks
}
for(i in 1:length(time)){
  ident.1 <- paste0("WT", time[i])
  ident.2 <- paste0("NT", time[i])
  da_peaks <- FindMarkers(
    object = Seurat_obj,
    ident.1 = ident.1,
    ident.2 = ident.2,
    only.pos = TRUE,
    test.use = 'LR',
    min.pct = 0.05,
    latent.vars = 'nCount_peaks')
  WT_pst_time_DiffPeak[[time[i]]] <- da_peaks
}

stuart-lab / signac

How sequencing depth impacts the number of DARs? #1789