macs3-project / MACS

MACS -- Model-based Analysis of ChIP-Seq
https://macs3-project.github.io/MACS/
BSD 3-Clause "New" or "Revised" License
698 stars 268 forks source link

Bug: MACS2 segfault on bedgraph output into non-existant folder #461

Open EricR86 opened 3 years ago

EricR86 commented 3 years ago

Describe the bug MACS2 segfaults. Output looks like the following:

INFO  @ Mon, 29 Mar 2021 16:07:56: #3   --SPMR is requested, so pileup will be normalized by sequencing depth in million reads. 
InNFO  @ Mon, 29 Mar 2021 16:07:56: #3 Call peaks for each chromosome... 
Segmentation fault

To Reproduce Run callpeak with the --name option specifying a directory that does not yet exist. e.g. macs2 callpeak --treatment chip_TA_H3K4me1.tagAlign.gz --control chip_TA_control.tagAlign.gz --format BED --name peak_output/chip_TA --gsize 2.8e9 --pvalue 1e-2 --nomodel --extsize 250 --keep-dup all --bdg --SPMR --shift 0 where the subfolder peak_output does not exist.

Expected behavior No segfault

Screenshots Here's a core dump where it shows calling into writing out a bedGraph, likely to a non-existant path.

#0  __vfprintf_internal (s=0x0, format=0x7fa124b1ca46 "%s\t%d\t%d\t%.5f\n", ap=ap@entry=0x7ffdc4b17ba0, mode_flags=2) at vfprintf-internal.c:1328
#1  0x00007fa1a8113023 in ___fprintf_chk (fp=<optimized out>, flag=<optimized out>, format=<optimized out>) at fprintf_chk.c:33
#2  0x00007fa124abf2f3 in __pyx_f_5MACS2_2IO_12CallPeakUnit_20CallerFromAlignments___write_bedGraph_for_a_chromosome ()
   from /home/ericr/miniconda3/envs/macs2/lib/python3.9/site-packages/MACS2/IO/CallPeakUnit.cpython-39-x86_64-linux-gnu.so
#3  0x00007fa124ae9bed in __pyx_f_5MACS2_2IO_12CallPeakUnit_20CallerFromAlignments___chrom_call_peak_using_certain_criteria ()
   from /home/ericr/miniconda3/envs/macs2/lib/python3.9/site-packages/MACS2/IO/CallPeakUnit.cpython-39-x86_64-linux-gnu.so
#4  0x00007fa124af94f9 in __pyx_f_5MACS2_2IO_12CallPeakUnit_20CallerFromAlignments_call_peaks ()
   from /home/ericr/miniconda3/envs/macs2/lib/python3.9/site-packages/MACS2/IO/CallPeakUnit.cpython-39-x86_64-linux-gnu.so
#5  0x00007fa124ade304 in __pyx_pw_5MACS2_2IO_12CallPeakUnit_20CallerFromAlignments_9call_peaks ()
   from /home/ericr/miniconda3/envs/macs2/lib/python3.9/site-packages/MACS2/IO/CallPeakUnit.cpython-39-x86_64-linux-gnu.so
#6  0x000055d931010ed5 in cfunction_call (func=0x7fa124a711d0, args=<optimized out>, kwargs=<optimized out>)
    at /home/conda/feedstock_root/build_artifacts/python-split_1613883019772/work/Objects/methodobject.c:539
#7  0x00007fa124b4749f in __Pyx_PyObject_Call () from /home/ericr/miniconda3/envs/macs2/lib/python3.9/site-packages/MACS2/PeakDetect.cpython-39-x86_64-linux-gnu.so
#8  0x00007fa124b5b85d in __pyx_pw_5MACS2_10PeakDetect_10PeakDetect_5__call_peaks_w_control ()
   from /home/ericr/miniconda3/envs/macs2/lib/python3.9/site-packages/MACS2/PeakDetect.cpython-39-x86_64-linux-gnu.so
#9  0x00007fa124b4a867 in __pyx_pw_5MACS2_10PeakDetect_10PeakDetect_3call_peaks ()
   from /home/ericr/miniconda3/envs/macs2/lib/python3.9/site-packages/MACS2/PeakDetect.cpython-39-x86_64-linux-gnu.so
#10 0x000055d930ff7b26 in _PyObject_MakeTpCall (tstate=0x55d9319060c0, callable=0x7fa124ec74c0, args=<optimized out>, nargs=<optimized out>, keywords=<optimized out>)
    at /home/conda/feedstock_root/build_artifacts/python-split_1613883019772/work/Include/internal/pycore_pyerrors.h:14
#11 0x000055d93105243e in _PyObject_VectorcallTstate (kwnames=<optimized out>, nargsf=<optimized out>, args=0x55d931eb7540, callable=0x7fa124ec74c0, tstate=0x55d9319060c0)
    at /home/conda/feedstock_root/build_artifacts/python-split_1613883019772/work/Include/cpython/abstract.h:1161

System (please complete the following information):

Additional context This problem does not exist on non-bed output because presumably there is no immediate call into the c-extension and python catches the missing path instead (where an Exception is raised).

taoliu commented 3 years ago

Really strange error. Could you try not to use --name peak_output/chip_TA in macs2 callpeak --treatment chip_TA_H3K4me1.tagAlign.gz --control chip_TA_control.tagAlign.gz --format BED --name peak_output/chip_TA --gsize 2.8e9 --pvalue 1e-2 --nomodel --extsize 250 --keep-dup all --bdg --SPMR --shift 0 to specify the output directory? The formal way is to use --outdir option as --outdir peak_output --name chip_TA? If the error is still there, could you share me the data files so that I can try to debug? If you can minimize the data file to reproduce the error, it would be even better. Thanks!

EricR86 commented 3 years ago

try not to use --name peak_output/chip_TA

Yes if you don't use the peak_output/ portion in the name option, the segfault does not happen. The segfault also doesn't happen if the peak_output directory already exists. So while maybe this is not the necessarily intended way of using the option, it's still a case where a segfault seems guaranteed to happen.

In terms of datasets I don't think it matters as long as the directory specified from the --name option is missing and the --bdg option is there. I used public datasets from ENCODE (the DOHH2 cell line, unfiltered aligned reads).