Closed pgugger closed 4 years ago
Paul,
Interesting result, and thanks for including the graphs.
This is probably due to the differences in alignment interpretation. I will refer you to the paragraph after Figure 4 here. Did you adjust the parameters of macs2 in single-end mode, to be more ATAC-seq-ish?
John Gaspar
Thanks. Yes, I used something like this for MACS2 SE:
MACS2 callpeak -t all.bam --nomodel --shift -100 --extsize 200 --format BAM -g mm10
Here are some direct differences between Genrich and macs2 in single-end mode:
--extsize 200
). Genrich uses a default of 100bp (-d
) in ATAC-seq mode.--max-gap
parameter that can be adjusted.) Genrich uses a default of 100bp (-g
).--extsize 200
). Genrich uses AUC peak-calling, and the minimum length is 0bp by default (-l
)I had a similar 'issue' recently. When checking the pileup of genrich I noticed clear spacing which I did not see for MACS2. However I realized this was explained by the extsize/-d option. The longer you extend your fragments, the smaller the wavy effects get.
MACS2 with --extsize 200 MACS2 with --extsize 100
Genrich with -d 200
Genrich with -d 100
I expect that the periodicity you see is caused by the extension length, since shorter fragments clearly show the nucleosome pattern, and thus the peaks called from these patterns are generally n nucleosomes long.
@Maarten-vd-Sande, thanks for the images. The pileups show the effect of varying --extsize
/-d
most directly. They also illustrate another key difference between Genrich and macs2 in single-end mode (which I've stated before, but it bears repeating):
in the default (
BAM
) mode [of macs2] ... with paired-end reads, most of the alignments are automatically discarded (half of the properly paired alignments and all of the unpaired alignments; secondary alignments are never considered).
Going from pileups to peaks adds another layer of complexity, and we must consider those differences (explained above) as well. For example, the fact that macs2 has a hard cutoff for minimum peak length, and Genrich does not, is clearly seen in the original graphs from @pgugger.
Hi John and others, In comparing ATAC-Seq peaks produced by Genrich and MACS2 in both paired-end and single-end modes, I plotted histograms of the peak widths and observed some striking difference. The Genrich results are multimodal and show a clear periodicity that roughly mirrors the periodicity in fragment sizes in the input BAM files. However, the MACS2 results do not show such a pattern, though there is perhaps a trace that is slightly offset in the MACS2 PE results. Is the periodicity of peak widths expected, or what do you make of the differences in these results?
Here are some plots (same scale) demonstrating what I mean:
I used commonly chosen arguments for each analysis and removed mm10 blacklisted regions. There are similar numbers of peak regions (85-87k), regardless of peak caller. The peak call sets have about 87% overlap when running
bedtools intersect
orbedtools subtract
.Thanks for your help.
Paul