macs3-project / MACS

MACS -- Model-based Analysis of ChIP-Seq
https://macs3-project.github.io/MACS/
BSD 3-Clause "New" or "Revised" License
712 stars 268 forks source link

Spike-in + input normalisation with MACS #546

Open acv21 opened 1 year ago

acv21 commented 1 year ago

Hello,

I have a few questions on ChIP-seq experimental design, and normalisation by input vs spike-in control. I am working with different tissues, and would like to do the following:

  1. Identify peaks that are only present in some tissues/samples but not in the others
  2. Compare the levels of histone acetylation (what I've IPed) at specific peaks among different samples
  3. Visualise normalised tracks

From my understanding, 1. can be done by comparing the signal in ChIP vs input samples by MACS callpeak, and this step does not make use of a spike-in control. My question is about the second point: MACS can compare peaks between samples with the bdgdiff command, however if I understand this correctly, this is about enrichment of signal within a peak in sample 1 vs sample 2. For instance, if peak A in sample 1 comprises 1% of all mapped reads for sample 1, and the same peak comprises 20% of mapped reads for sample 2, bdgdiff would flag peak A as significantly enriched in sample 2. However, in a scenario where the global level of e.g. acetylation changes between sample 1 and 2, bdgdiff would not necessarily peak up this difference. E.g. if sample 1 had an "absolute"/global acetylation level of 1000, whilst sample 2 had an "absolute" acetylation level of 50, the "absolute" acetylation signal at peak A would be 1%1000 = 10 for sample 1, and 20%50 = 10 for sample 2. In this case, bdgdiff would still flag peak A as enriched in sample 2, but it could not work out that the "absolute" acetylation level is the same in each sample (right?). Incorporating a spike-in control would allow to pick up "absolute"/global differences, as in Egan et al. (2016).

My questions are: a) How can I incorporate spike-ins into my MACS analysis? Should I perform spike-in normalisation prior to running bdgdiff and or callpeak? If so, would it be the same calculation as in Egan et al. (2016), i.e. scaling the number of mapped dm6 (spike-in) reads by the lowest and multiplying these "correction factors" by the number of mapped hg38 reads (for both ChIP and input samples)? b) How can I incorporate spike-in normalisation for visualisation purposes? Would I need to do the spike-in normalisation on the bam files beforehand and then run callpeak, to get a bedgraph file which is both spike-in-normalised and also CPM-normalised? Or could I perform the spike-in normalisation after running callpeak on raw files, e.g. using deepTools to convert the bedgraph (from MACS) to bigwig including a scaling factor (bamCoverage --scaleFactor)?

Thanks for your time and apologies for the long post!

ScienceAdvances commented 8 months ago

Have you solved the problem? I also want to know if it needs spikein in callpeak.