I have a few questions on ChIP-seq experimental design, and normalisation by input vs spike-in control. I am working with different tissues, and would like to do the following:
Identify peaks that are only present in some tissues/samples but not in the others
Compare the levels of histone acetylation (what I've IPed) at specific peaks among different samples
Visualise normalised tracks
From my understanding, 1. can be done by comparing the signal in ChIP vs input samples by MACS callpeak, and this step does not make use of a spike-in control. My question is about the second point: MACS can compare peaks between samples with the bdgdiff command, however if I understand this correctly, this is about enrichment of signal within a peak in sample 1 vs sample 2. For instance, if peak A in sample 1 comprises 1% of all mapped reads for sample 1, and the same peak comprises 20% of mapped reads for sample 2, bdgdiff would flag peak A as significantly enriched in sample 2. However, in a scenario where the global level of e.g. acetylation changes between sample 1 and 2, bdgdiff would not necessarily peak up this difference. E.g. if sample 1 had an "absolute"/global acetylation level of 1000, whilst sample 2 had an "absolute" acetylation level of 50, the "absolute" acetylation signal at peak A would be 1%1000 = 10 for sample 1, and 20%50 = 10 for sample 2. In this case, bdgdiff would still flag peak A as enriched in sample 2, but it could not work out that the "absolute" acetylation level is the same in each sample (right?). Incorporating a spike-in control would allow to pick up "absolute"/global differences, as in Egan et al. (2016).
My questions are:
a) How can I incorporate spike-ins into my MACS analysis? Should I perform spike-in normalisation prior to running bdgdiff and or callpeak? If so, would it be the same calculation as in Egan et al. (2016), i.e. scaling the number of mapped dm6 (spike-in) reads by the lowest and multiplying these "correction factors" by the number of mapped hg38 reads (for both ChIP and input samples)?
b) How can I incorporate spike-in normalisation for visualisation purposes? Would I need to do the spike-in normalisation on the bam files beforehand and then run callpeak, to get a bedgraph file which is both spike-in-normalised and also CPM-normalised? Or could I perform the spike-in normalisation after running callpeak on raw files, e.g. using deepTools to convert the bedgraph (from MACS) to bigwig including a scaling factor (bamCoverage --scaleFactor)?
Thanks for your time and apologies for the long post!
Hello,
I have a few questions on ChIP-seq experimental design, and normalisation by input vs spike-in control. I am working with different tissues, and would like to do the following:
From my understanding, 1. can be done by comparing the signal in ChIP vs input samples by MACS
callpeak
, and this step does not make use of a spike-in control. My question is about the second point: MACS can compare peaks between samples with thebdgdiff
command, however if I understand this correctly, this is about enrichment of signal within a peak in sample 1 vs sample 2. For instance, if peak A in sample 1 comprises 1% of all mapped reads for sample 1, and the same peak comprises 20% of mapped reads for sample 2, bdgdiff would flag peak A as significantly enriched in sample 2. However, in a scenario where the global level of e.g. acetylation changes between sample 1 and 2, bdgdiff would not necessarily peak up this difference. E.g. if sample 1 had an "absolute"/global acetylation level of 1000, whilst sample 2 had an "absolute" acetylation level of 50, the "absolute" acetylation signal at peak A would be 1%1000 = 10 for sample 1, and 20%50 = 10 for sample 2. In this case, bdgdiff would still flag peak A as enriched in sample 2, but it could not work out that the "absolute" acetylation level is the same in each sample (right?). Incorporating a spike-in control would allow to pick up "absolute"/global differences, as in Egan et al. (2016).My questions are: a) How can I incorporate spike-ins into my MACS analysis? Should I perform spike-in normalisation prior to running bdgdiff and or callpeak? If so, would it be the same calculation as in Egan et al. (2016), i.e. scaling the number of mapped dm6 (spike-in) reads by the lowest and multiplying these "correction factors" by the number of mapped hg38 reads (for both ChIP and input samples)? b) How can I incorporate spike-in normalisation for visualisation purposes? Would I need to do the spike-in normalisation on the bam files beforehand and then run callpeak, to get a bedgraph file which is both spike-in-normalised and also CPM-normalised? Or could I perform the spike-in normalisation after running callpeak on raw files, e.g. using deepTools to convert the bedgraph (from MACS) to bigwig including a scaling factor (bamCoverage --scaleFactor)?
Thanks for your time and apologies for the long post!