Periodicity of ATAC peak widths

pgugger commented 5 years ago

Hi John and others, In comparing ATAC-Seq peaks produced by Genrich and MACS2 in both paired-end and single-end modes, I plotted histograms of the peak widths and observed some striking difference. The Genrich results are multimodal and show a clear periodicity that roughly mirrors the periodicity in fragment sizes in the input BAM files. However, the MACS2 results do not show such a pattern, though there is perhaps a trace that is slightly offset in the MACS2 PE results. Is the periodicity of peak widths expected, or what do you make of the differences in these results?

Here are some plots (same scale) demonstrating what I mean: Genrich MAC2_PE MACS2_SE

I used commonly chosen arguments for each analysis and removed mm10 blacklisted regions. There are similar numbers of peak regions (85-87k), regardless of peak caller. The peak call sets have about 87% overlap when running bedtools intersect or bedtools subtract.

Thanks for your help.

Paul

jsh58 commented 5 years ago

Paul,

Interesting result, and thanks for including the graphs.

This is probably due to the differences in alignment interpretation. I will refer you to the paragraph after Figure 4 here. Did you adjust the parameters of macs2 in single-end mode, to be more ATAC-seq-ish?

John Gaspar

pgugger commented 5 years ago

Thanks. Yes, I used something like this for MACS2 SE:

MACS2 callpeak -t all.bam --nomodel --shift -100 --extsize 200 --format BAM -g mm10

jsh58 commented 5 years ago

Here are some direct differences between Genrich and macs2 in single-end mode:

With the command you gave above, macs2 interprets intervals of 200bp (--extsize 200). Genrich uses a default of 100bp (-d) in ATAC-seq mode.
With macs2 in single-end mode, the maximum gap between significant regions to join them into a single peak is the average length of the first 10 reads in the BAM. (Yes, that is nonsensical, but my edited version of macs2 has a --max-gap parameter that can be adjusted.) Genrich uses a default of 100bp (-g).
With macs2 in single-end mode, the minimum length of a peak is the fragment length (i.e. 200bp, with --extsize 200). Genrich uses AUC peak-calling, and the minimum length is 0bp by default (-l)

Maarten-vd-Sande commented 5 years ago

I had a similar 'issue' recently. When checking the pileup of genrich I noticed clear spacing which I did not see for MACS2. However I realized this was explained by the extsize/-d option. The longer you extend your fragments, the smaller the wavy effects get.

MACS2 with --extsize 200 MACS2 with --extsize 100

Genrich with -d 200

Genrich with -d 100

I expect that the periodicity you see is caused by the extension length, since shorter fragments clearly show the nucleosome pattern, and thus the peaks called from these patterns are generally n nucleosomes long.

jsh58 commented 5 years ago

@Maarten-vd-Sande, thanks for the images. The pileups show the effect of varying --extsize/-d most directly. They also illustrate another key difference between Genrich and macs2 in single-end mode (which I've stated before, but it bears repeating):

in the default (BAM) mode [of macs2] ... with paired-end reads, most of the alignments are automatically discarded (half of the properly paired alignments and all of the unpaired alignments; secondary alignments are never considered).

Going from pileups to peaks adds another layer of complexity, and we must consider those differences (explained above) as well. For example, the fact that macs2 has a hard cutoff for minimum peak length, and Genrich does not, is clearly seen in the original graphs from @pgugger.

jsh58 / Genrich

Periodicity of ATAC peak widths #27