jsh58 / Genrich

Detecting sites of genomic enrichment
MIT License
185 stars 27 forks source link

Periodicity of ATAC peak widths #27

Closed pgugger closed 4 years ago

pgugger commented 5 years ago

Hi John and others, In comparing ATAC-Seq peaks produced by Genrich and MACS2 in both paired-end and single-end modes, I plotted histograms of the peak widths and observed some striking difference. The Genrich results are multimodal and show a clear periodicity that roughly mirrors the periodicity in fragment sizes in the input BAM files. However, the MACS2 results do not show such a pattern, though there is perhaps a trace that is slightly offset in the MACS2 PE results. Is the periodicity of peak widths expected, or what do you make of the differences in these results?

Here are some plots (same scale) demonstrating what I mean: Genrich MAC2_PE MACS2_SE

I used commonly chosen arguments for each analysis and removed mm10 blacklisted regions. There are similar numbers of peak regions (85-87k), regardless of peak caller. The peak call sets have about 87% overlap when running bedtools intersect or bedtools subtract.

Thanks for your help.

Paul

jsh58 commented 5 years ago

Paul,

Interesting result, and thanks for including the graphs.

This is probably due to the differences in alignment interpretation. I will refer you to the paragraph after Figure 4 here. Did you adjust the parameters of macs2 in single-end mode, to be more ATAC-seq-ish?

John Gaspar

pgugger commented 5 years ago

Thanks. Yes, I used something like this for MACS2 SE:

MACS2 callpeak -t all.bam --nomodel --shift -100 --extsize 200 --format BAM -g mm10
jsh58 commented 5 years ago

Here are some direct differences between Genrich and macs2 in single-end mode:

Maarten-vd-Sande commented 5 years ago

I had a similar 'issue' recently. When checking the pileup of genrich I noticed clear spacing which I did not see for MACS2. However I realized this was explained by the extsize/-d option. The longer you extend your fragments, the smaller the wavy effects get.

MACS2 with --extsize 200 image MACS2 with --extsize 100 image

Genrich with -d 200 image

Genrich with -d 100 image

I expect that the periodicity you see is caused by the extension length, since shorter fragments clearly show the nucleosome pattern, and thus the peaks called from these patterns are generally n nucleosomes long.

jsh58 commented 5 years ago

@Maarten-vd-Sande, thanks for the images. The pileups show the effect of varying --extsize/-d most directly. They also illustrate another key difference between Genrich and macs2 in single-end mode (which I've stated before, but it bears repeating):

in the default (BAM) mode [of macs2] ... with paired-end reads, most of the alignments are automatically discarded (half of the properly paired alignments and all of the unpaired alignments; secondary alignments are never considered).

Going from pileups to peaks adds another layer of complexity, and we must consider those differences (explained above) as well. For example, the fact that macs2 has a hard cutoff for minimum peak length, and Genrich does not, is clearly seen in the original graphs from @pgugger.