macs3-project / MACS

MACS -- Model-based Analysis of ChIP-Seq
https://macs3-project.github.io/MACS/
BSD 3-Clause "New" or "Revised" License
712 stars 268 forks source link

AssertionError: slocal can't be smaller than d #407

Open kmavrommatis opened 4 years ago

kmavrommatis commented 4 years ago

Hi, this question has been asked previously but I have not managed to find a satisfactory solution

I have a CHIPseq dataset sequenced with paired end reads. I aligned the reads with bowtie2 for Input and Treatment and got an average insert size of ~250 for all samples, according to picard collectinsertsize metrics. using plotFingerprint from deeptools indicates that the enrichment of these samples is sufficiently good. When using macs2 on this data using as input format BAMPE, it detects fragment size that is longer than 3kb and exits with error.

Note that using macs v2.2.7.1 predicts different fragment size length than macs v2.1.0 i.e. v2.2.7.1

INFO  @ Wed, 08 Jul 2020 21:41:51: 30906652 fragments have been read. 
INFO  @ Wed, 08 Jul 2020 21:42:15: #1 mean fragment size is determined as 3943.0 bp from treatment 
INFO  @ Wed, 08 Jul 2020 21:42:15: #1 note: mean fragment size in control is 5923.6 bp -- value ignored 
INFO  @ Wed, 08 Jul 2020 21:42:15: #1 fragment size = 3943.0 
INFO  @ Wed, 08 Jul 2020 21:42:15: #1  total fragments in treatment: 25461742 
INFO  @ Wed, 08 Jul 2020 21:42:15: #1 user defined the maximum fragments... 
INFO  @ Wed, 08 Jul 2020 21:42:15: #1 filter out redundant fragments by allowing at most 1 identical fragment(s) 
INFO  @ Wed, 08 Jul 2020 21:43:32: #1  fragments after filtering in treatment: 25444635 
INFO  @ Wed, 08 Jul 2020 21:43:32: #1  Redundant rate of treatment: 0.00 
INFO  @ Wed, 08 Jul 2020 21:43:32: #1  total fragments in control: 30906652 
INFO  @ Wed, 08 Jul 2020 21:43:32: #1 user defined the maximum fragments... 
INFO  @ Wed, 08 Jul 2020 21:43:32: #1 filter out redundant fragments by allowing at most 1 identical fragment(s) 
INFO  @ Wed, 08 Jul 2020 21:45:06: #1 finished! 
after filtering in control: 30890631 
INFO  @ Wed, 08 Jul 2020 21:45:06: #1  Redundant rate of control: 0.00 
INFO  @ Wed, 08 Jul 2020 21:45:06: #1 finished! 
INFO  @ Wed, 08 Jul 2020 21:45:06: #2 Build Peak Model... 
INFO  @ Wed, 08 Jul 2020 21:45:06: #2 Skipped... 
INFO  @ Wed, 08 Jul 2020 21:45:06: #3 Call peaks... 
Traceback (most recent call last):
  File "/usr/local/bin/macs2", line 4, in <module>
    __import__('pkg_resources').run_script('MACS2==2.2.7.1', 'macs2')
  File "/usr/local/lib/python3.8/site-packages/pkg_resources/__init__.py", line 667, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/usr/local/lib/python3.8/site-packages/pkg_resources/__init__.py", line 1464, in run_script
    exec(code, namespace, namespace)
  File "/usr/local/lib/python3.8/site-packages/MACS2-2.2.7.1-py3.8-linux-x86_64.egg/EGG-INFO/scripts/macs2", line 653, in <module>
    main()
  File "/usr/local/lib/python3.8/site-packages/MACS2-2.2.7.1-py3.8-linux-x86_64.egg/EGG-INFO/scripts/macs2", line 51, in main
    run( args )
  File "/usr/local/lib/python3.8/site-packages/MACS2-2.2.7.1-py3.8-linux-x86_64.egg/MACS2/callpeak_cmd.py", line 256, in run
    peakdetect.call_peaks()
  File "MACS2/PeakDetect.pyx", line 107, in MACS2.PeakDetect.PeakDetect.call_peaks
  File "MACS2/PeakDetect.pyx", line 179, in MACS2.PeakDetect.PeakDetect.__call_peaks_w_control
AssertionError: 1000 can't be smaller than 3943.0146484375!

while 2.1.0

INFO  @ Wed, 08 Jul 2020 17:27:44: #1 mean fragment size is determined as 2165 bp from treatment 
INFO  @ Wed, 08 Jul 2020 17:27:44: #1 note: mean fragment size in control is 2916 bp -- value ignored 
INFO  @ Wed, 08 Jul 2020 17:27:44: #1  total fragments in trINFO  @ Wed, 08 Jul 2020 17:27:44: #1  total fragments in treatment: 25461742 
INFO  @ Wed, 08 Jul 2020 17:27:44: #1 user defined the maximum fragments... 
INFO  @ Wed, 08 Jul 2020 17:27:44: #1 filter out redundant fragments by allowing at most 1 identical fragment(s) 
INFO  @ Wed, 08 Jul 2020 17:27:44: #1  fragments after filtering in treatment: 25447791 
INFO  @ Wed, 08 Jul 2020 17:27:44: #1  Redundant rate of treatment: 0.00 
INFO  @ Wed, 08 Jul 2020 17:27:44: #1  total fragments in control: 30906652 
INFO  @ Wed, 08 Jul 2020 17:27:44: #1 user defined the maximum fragments... 
INFO  @ Wed, 08 Jul 2020 17:27:44: #1 filter out redundant fragments by allowing at most 1 identical fragment(s) 
INFO  @ Wed, 08 Jul 2020 17:27:44: #2 Use 2165 as fragment length 
INFO  @ Wed, 08 Jul 2020 17:27:44: #3 Call peaks... 
: #1  Redundant rate of control: 0.00 
INFO  @ Wed, 08 Jul 2020 17:27:44: #1 finished! 
INFO  @ Wed, 08 Jul 2020 17:27:44: #2 Build Peak Model... 
INFO  @ Wed, 08 Jul 2020 17:27:44: #2 Skipped... 
INFO  @ Wed, 08 Jul 2020 17:27:44: #2 Use 2165 as fragment length 
INFO  @ Wed, 08 Jul 2020 17:27:44: #3 Call peaks... 
Traceback (most recent call last):
  File "/usr/local/bin/macs2", line 559, in <module>
    run( args )
  File "/usr/local/lib/python2.7/dist-packa    run( args )
  File "/usr/local/lib/python2.7/dist-packages/MACS2/callpeak.py", line 261, in run
    peakdetect.call_peaks()
  File "cPeakDetect.pyx", line 105, in MACS2.cPeakDetect.PeakDetect.call_peaks (MACS2/cPeakDetect.c:1696)
  File "cPeakDetect.pyx", line 177, in MACS2.cPeakDetect.PeakDetect.__call_peaks_w_control (MACS2/cPeakDetect.c:2208)
AssertionError: slocal can't be smaller than d!

The command line used was

 macs2 callpeak --name AB157_INPUT0p3nM_AB155_H3K27me3 --SPMR --bdg --bw 200 --call-summits -f BAMPE -g hs --keep-dup 1 --mfold 5 80 --nomodel -q 0.05 --control /chip_seq_qc/samtools_index/AB157_INPUT0p3nM_mulmaprem.mdup_blkrem.bam --treatment /chip_seq_qc/samtools_index/AB155_H3K27me3_mulmaprem.mdup_blkrem.bam

For now, to avoid this error I have ended up specifying the format as BAM, i.e. ignoring the paired read.

How is the insert size calculated in BAMPE data and why is there such difference between macs2 and picard estimates? is there any suggestion how to bypass the problem by using the PE data, I could set slocal to a large value, or 0 but that would decrease the quality of the result, correct?

Thanks in advance for your help

taoliu commented 4 years ago

@kmavrommatis The way for MACS to decide fragment length is to read the TLEN field of the BAM file. Could you check by:

samtools view /chip_seq_qc/samtools_index/AB155_H3K27me3_mulmaprem.mdup_blkrem.bam | cut -f 9 | awk '$1>=0{print}' > tlen.txt

When MACS reads BAMPE file, it will ignore the rightmost end with a negative TLEN because the location of the rightmost end can be inferred by adding the leftmost end with TLEN. After that, calculate an average value from tlen.txt.