macs3-project / MACS

MACS -- Model-based Analysis of ChIP-Seq
https://macs3-project.github.io/MACS/
BSD 3-Clause "New" or "Revised" License
698 stars 268 forks source link

Q: The number of reported differential peaks is much lower than the number of originally called peaks: Why? #402

Open eregenyi opened 4 years ago

eregenyi commented 4 years ago

Hi Taoliu,

I am comparing two transcription factor's peaks. I analyzed unique peaks, meaning peaks that had no overlap (0 overlapping bases) between the two TFs. I was also interested in differentially enriched peaks, so I followed the differential binding event wiki article (https://github.com/macs3-project/MACS/wiki/Call-differential-binding-events).

However, inspecting the results, I found that much less of the peaks are reported in the differential analysis compared to the originally called peaks. Looking at the diff_.....bed peaks in IGV, some very obvious peaks at known target promoters are missed (they appear neither among the common, nor the unique peaks).

Number of lines in TF1- and TF2_peaks.xls: ~90 000 and ~130 000 Number of lines in diff_common: 31 360 diff_TF1: 1 172 diff_TF2: 329

I kept the 'd' parameter constant.

Why is this? Could I tweak the parameters to include more peaks? Can I somehow use these peaks? I have strong strong feelings against using them after seeing that some obvious peaks are not reported in _common.bed.

Thank you for your help in advance

taoliu commented 4 years ago

Sorry for my late reply! This is a common issue in ChIP-seq differential analysis. If you only look at 'unique peaks' that are not overlapping between conditions. Many of them may be called due to the cutoff of the original peak calling, especially when the peak significance (measured in p-value/q-value) is around the set cutoff. I think you understand this and this is your 'strong strong feeling against using them'. Regarding MACS way to call differential peaks, it won't just divide all peaks into common, diff_TF1, and diff_TF2. It won't report those in between -- which means the region is neither significantly biased to either condition or significantly 'equal'. You can try to lower the 'likelihood ratio cutoff' of bdgdiff -- macs2 bdgdiff -C 2 to set a likelihood ratio=100 instead of the default -C 3 (1000).