macs3-project / MACS

MACS -- Model-based Analysis of ChIP-Seq
https://macs3-project.github.io/MACS/
BSD 3-Clause "New" or "Revised" License
702 stars 268 forks source link

Bug: abnormal output of .bed result from "macs2 callpeak" function #326

Closed huangwb8 closed 5 years ago

huangwb8 commented 5 years ago

Describe the bug I use macs2 callpeak to do peak finding for Chip-Seq data. I notice there are something wrong happen when *.bed results were output.

To Reproduce

I use code as following:

ws=/home/huangwb8/Test/test_chip_3
cd $ws/output/align
ls *merge.rmdup.bam | grep -v Control | while read id
do (nohup macs2 callpeak --tempdir /data/ \
-c Control*bam -t $id -f BAM -B -g mm \
-n $ws/output/peaks/$(basename $id ".bam") > \
$ws/log/peaks/$(basename $id ".bam").log 2>&1 &)
done

Here is a peak record for example. They are produced by the same code

  • *.narrowPeak
    chr1  4481756 4481888 /home/huangwb8/Test/test_chip_3/output/peaks/H2Aub1.ChIPSeq-GSM850471.merge.rmdup_peak_1    47  .      4.83060  9.25272 4.70255 96

System (please complete the following information):

huangwb8 commented 5 years ago

here is one of the log of macs2 callpeak. Hope it helps!


INFO  @ Wed, 09 Oct 2019 21:46:40: 
# Command line: callpeak --tempdir /data/ -c Control.MockIP-GSM850473.merge.rmdup.bam -t H2Aub1.ChIPSeq-GSM850471.merge.rmdup.bam -f BAM -B -g mm -n /home/huangwb8/Test/test_chip_3/output/peaks/H2Aub1.ChIPSeq-GSM850471.merge.rmdup
# ARGUMENTS LIST:
# name = /home/huangwb8/Test/test_chip_3/output/peaks/H2Aub1.ChIPSeq-GSM850471.merge.rmdup
# format = BAM
# ChIP-seq file = ['H2Aub1.ChIPSeq-GSM850471.merge.rmdup.bam']
# control file = ['Control.MockIP-GSM850473.merge.rmdup.bam']
# effective genome size = 1.87e+09
# band width = 300
# model fold = [5, 50]
# qvalue cutoff = 5.00e-02
# The maximum gap between significant sites is assigned as the read length/tag size.
# The minimum length of peaks is assigned as the predicted fragment length "d".
# Larger dataset will be scaled towards smaller dataset.
# Range for calculating regional lambda is: 1000 bps and 10000 bps
# Broad region calling is off
# Paired-End mode is off

INFO @ Wed, 09 Oct 2019 21:46:40: #1 read tag files... INFO @ Wed, 09 Oct 2019 21:46:40: #1 read treatment tags... INFO @ Wed, 09 Oct 2019 21:46:49: 1000000 INFO @ Wed, 09 Oct 2019 21:46:57: 2000000 INFO @ Wed, 09 Oct 2019 21:47:06: 3000000 INFO @ Wed, 09 Oct 2019 21:47:14: 4000000 INFO @ Wed, 09 Oct 2019 21:47:24: 5000000 INFO @ Wed, 09 Oct 2019 21:47:33: 6000000 INFO @ Wed, 09 Oct 2019 21:47:41: 7000000 INFO @ Wed, 09 Oct 2019 21:47:49: 8000000 INFO @ Wed, 09 Oct 2019 21:47:56: 9000000 INFO @ Wed, 09 Oct 2019 21:48:02: 10000000 INFO @ Wed, 09 Oct 2019 21:48:09: 11000000 INFO @ Wed, 09 Oct 2019 21:48:16: 12000000 INFO @ Wed, 09 Oct 2019 21:48:21: 13000000 INFO @ Wed, 09 Oct 2019 21:48:26: 14000000 INFO @ Wed, 09 Oct 2019 21:48:26: #1.2 read input tags... INFO @ Wed, 09 Oct 2019 21:48:30: 1000000 INFO @ Wed, 09 Oct 2019 21:48:34: 2000000 INFO @ Wed, 09 Oct 2019 21:48:37: 3000000 INFO @ Wed, 09 Oct 2019 21:48:41: 4000000 INFO @ Wed, 09 Oct 2019 21:48:45: 5000000 INFO @ Wed, 09 Oct 2019 21:48:49: 6000000 INFO @ Wed, 09 Oct 2019 21:48:53: 7000000 INFO @ Wed, 09 Oct 2019 21:48:57: 8000000 INFO @ Wed, 09 Oct 2019 21:49:01: 9000000 INFO @ Wed, 09 Oct 2019 21:49:04: #1 tag size is determined as 48 bps INFO @ Wed, 09 Oct 2019 21:49:04: #1 tag size = 48.0 INFO @ Wed, 09 Oct 2019 21:49:04: #1 total tags in treatment: 14005406 INFO @ Wed, 09 Oct 2019 21:49:04: #1 user defined the maximum tags... INFO @ Wed, 09 Oct 2019 21:49:04: #1 filter out redundant tags at the same location and the same strand by allowing at most 1 tag(s) INFO @ Wed, 09 Oct 2019 21:49:04: #1 tags after filtering in treatment: 14005404 INFO @ Wed, 09 Oct 2019 21:49:04: #1 Redundant rate of treatment: 0.00 INFO @ Wed, 09 Oct 2019 21:49:04: #1 total tags in control: 9869716 INFO @ Wed, 09 Oct 2019 21:49:04: #1 user defined the maximum tags... INFO @ Wed, 09 Oct 2019 21:49:04: #1 filter out redundant tags at the same location and the same strand by allowing at most 1 tag(s) INFO @ Wed, 09 Oct 2019 21:49:04: #1 tags after filtering in control: 9869713 INFO @ Wed, 09 Oct 2019 21:49:04: #1 Redundant rate of control: 0.00 INFO @ Wed, 09 Oct 2019 21:49:04: #1 finished! INFO @ Wed, 09 Oct 2019 21:49:04: #2 Build Peak Model... INFO @ Wed, 09 Oct 2019 21:49:04: #2 looking for paired plus/minus strand peaks... INFO @ Wed, 09 Oct 2019 21:49:06: #2 number of paired peaks: 5543 INFO @ Wed, 09 Oct 2019 21:49:06: start model_add_line... INFO @ Wed, 09 Oct 2019 21:49:06: start X-correlation... INFO @ Wed, 09 Oct 2019 21:49:06: end of X-cor INFO @ Wed, 09 Oct 2019 21:49:06: #2 finished! INFO @ Wed, 09 Oct 2019 21:49:06: #2 predicted fragment length is 125 bps INFO @ Wed, 09 Oct 2019 21:49:06: #2 alternative fragment length(s) may be 125 bps INFO @ Wed, 09 Oct 2019 21:49:06: #2.2 Generate R script for model : /home/huangwb8/Test/test_chip_3/output/peaks/H2Aub1.ChIPSeq-GSM850471.merge.rmdup_model.r INFO @ Wed, 09 Oct 2019 21:49:06: #3 Call peaks... INFO @ Wed, 09 Oct 2019 21:49:06: #3 Pre-compute pvalue-qvalue table... INFO @ Wed, 09 Oct 2019 21:49:57: #3 In the peak calling step, the following will be performed simultaneously: INFO @ Wed, 09 Oct 2019 21:49:57: #3 Write bedGraph files for treatment pileup (after scaling if necessary)... /home/huangwb8/Test/test_chip_3/output/peaks/H2Aub1.ChIPSeq-GSM850471.merge.rmdup_treat_pileup.bdg INFO @ Wed, 09 Oct 2019 21:49:57: #3 Write bedGraph files for control lambda (after scaling if necessary)... /home/huangwb8/Test/test_chip_3/output/peaks/H2Aub1.ChIPSeq-GSM850471.merge.rmdup_control_lambda.bdg INFO @ Wed, 09 Oct 2019 21:49:57: #3 Pileup will be based on sequencing depth in control. INFO @ Wed, 09 Oct 2019 21:49:57: #3 Call peaks for each chromosome... INFO @ Wed, 09 Oct 2019 21:52:00: #4 Write output xls file... /home/huangwb8/Test/test_chip_3/output/peaks/H2Aub1.ChIPSeq-GSM850471.merge.rmdup_peaks.xls INFO @ Wed, 09 Oct 2019 21:52:00: #4 Write peak in narrowPeak format file... /home/huangwb8/Test/test_chip_3/output/peaks/H2Aub1.ChIPSeq-GSM850471.merge.rmdup_peaks.narrowPeak INFO @ Wed, 09 Oct 2019 21:52:00: #4 Write summits bed file... /home/huangwb8/Test/test_chip_3/output/peaks/H2Aub1.ChIPSeq-GSM850471.merge.rmdup_summits.bed INFO @ Wed, 09 Oct 2019 21:52:00: Done!

taoliu commented 5 years ago

Hi @huangwb8, first, please do not use -n, the name of the run, to specify output directories. Instead, use --outdir. Do this:

macs2 callpeak --tempdir /data/ \
-c Control*bam -t $id -f BAM -B -g mm \
-n $(basename $id ".bam") \
--outdir $ws/output/peaks/

Regarding your question, please check the README to understand the output files from MACS2. The ".bed" file output is for peak "summits" which is the highest 1bp within a peak. The purpose of this file has been indicated in the filename "xxx_summits.bed".

huangwb8 commented 5 years ago

OK, I got it. Thanks a lot!

Esteban-Escobar commented 4 years ago

@taoliu Hi, I'm currently doing an analysis in MACS2 with m6A-seq data and i had the same doubt, but if it is normal that the lenght of the peak summit is of 1, why would you recommend it to search motifs at binding sites. For example, I want to use meme to search motifs from the summits.bed results, previously extracting the sequences from the locations indicated in that file using bedtools bedintersect. But like i mentioned before is just 1 of lenght, so my question here is What would you recommend me to do? or do i take the locations indicated in the .narrowPeak file or .xls file (outputs of MACS2).

taoliu commented 4 years ago

@Esteban-Escobar To use MEME to search for motif at peak summit, which is assumed to be the binding site, we usually search for an x00bps (e.g. 200bps) window around the summit.

awk -v OFS="\t" '{print $1,$2-100,$2+100,$4,$5}' xxx_summits.bed > xxx_200bp_around_summits.bed