macs3-project / MACS

MACS -- Model-based Analysis of ChIP-Seq
https://macs3-project.github.io/MACS/
BSD 3-Clause "New" or "Revised" License
712 stars 268 forks source link

Q: Are 2.1.2.1 or 2.1.1.20160309 results better? Some differences noted #537

Open RichardCorbett opened 1 year ago

RichardCorbett commented 1 year ago

Hi folks,

We had trouble processing some data with 2.1.1.20160309 so we ran instead with 2.1.2.1. In another test we are seeing what look like significant differences in results between versions and are looking for guidance about which set of results should be reported.

We have samples with paired end reads that we processed with both versions. Here are examples of the commands used:

#2.1.1.20160309
macs2 callpeak -t A91514_2_lanes_dupsFlagged.bam -c A91520_2_lanes_dupsFlagged.bam --gsize hs -f BAMPE --name A91514_H3K4me1 --outdir out1  --broad --bdg

#2.1.2.1
macs2 callpeak -t A91514_2_lanes_dupsFlagged.bam -c A91520_2_lanes_dupsFlagged.bam --gsize hs -f BAMPE --name A91514_H3K4me1 --outdir out2  --broad --bdg

Although the peaks in the A91514_H3K4me1_peaks.xls cover similar fractions of the genome, the peaks themselves are quite disparate

metric 2.1.1.20160309 2.1.2.1
total bases in peaks 404096581 381773429
total peaks 238101 196949
peak bases unique to dataset 35769857 13487857
peaks completely unique to dataset 30261 468

First, off does it make sense to compare results this way? If so, is there reason to trust one set over the other? Which would you use?

RichardCorbett commented 1 year ago

Here's in IGV screenshot of some of the differences in peaks between versions. igv_snapshot_A84624_comparison The top bam is the control bam, the second contains the ChIP reads. The 4 bed files contain either the peaks called by each version, or the subtraction between the 2 sets of peaks.