macs3-project / MACS

MACS -- Model-based Analysis of ChIP-Seq
https://macs3-project.github.io/MACS/
BSD 3-Clause "New" or "Revised" License
698 stars 268 forks source link

Errors out at peak calling #122

Open blikzen opened 8 years ago

blikzen commented 8 years ago

Have you guys run into this before? Know what's causing the issue, or how to work around it?

command im using:

macs2 callpeak -t $chipfile -c $controlfile -B --nomodel --shift $shift --SPMR -g hs -q 0.01 --broad -f BAMPE -n $chipfile.macs2

INFO @ Wed, 16 Mar 2016 22:01:13: #3 Call peaks... Traceback (most recent call last): File "/usr/local/bin/macs2", line 5, in pkg_resources.run_script('MACS2==2.1.1.20160309', 'macs2') File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 528, in run_script self.require(requires)[0].run_script(script_name, ns) File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 1394, in run_script execfile(script_filename, namespace, namespace) File "/usr/local/lib/python2.7/dist-packages/MACS2-2.1.1.20160309-py2.7-linux-x86_64.egg/EGG-INFO/scripts/macs2", line 617, in main() File "/usr/local/lib/python2.7/dist-packages/MACS2-2.1.1.20160309-py2.7-linux-x86_64.egg/EGG-INFO/scripts/macs2", line 57, in main run( args ) File "/usr/local/lib/python2.7/dist-packages/MACS2-2.1.1.20160309-py2.7-linux-x86_64.egg/MACS2/callpeak_cmd.py", line 264, in run peakdetect.call_peaks() File "PeakDetect.pyx", line 105, in MACS2.PeakDetect.PeakDetect.call_peaks (MACS2/PeakDetect.c:1704) File "PeakDetect.pyx", line 177, in MACS2.PeakDetect.PeakDetect.call_peaks_w_control (MACS2/PeakDetect.c:2216) AssertionError: slocal can't be smaller than d! Traceback (most recent call last): File "/usr/local/bin/macs2", line 5, in pkg_resources.run_script('MACS2==2.1.1.20160309', 'macs2') File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 528, in run_script self.require(requires)[0].run_script(script_name, ns) File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 1394, in run_script execfile(script_filename, namespace, namespace) File "/usr/local/lib/python2.7/dist-packages/MACS2-2.1.1.20160309-py2.7-linux-x86_64.egg/EGG-INFO/scripts/macs2", line 617, in main() File "/usr/local/lib/python2.7/dist-packages/MACS2-2.1.1.20160309-py2.7-linux-x86_64.egg/EGG-INFO/scripts/macs2", line 57, in main run( args ) File "/usr/local/lib/python2.7/dist-packages/MACS2-2.1.1.20160309-py2.7-linux-x86_64.egg/MACS2/callpeak_cmd.py", line 264, in run peakdetect.call_peaks() File "PeakDetect.pyx", line 105, in MACS2.PeakDetect.PeakDetect.call_peaks (MACS2/PeakDetect.c:1704) File "PeakDetect.pyx", line 259, in MACS2.PeakDetect.PeakDetect.call_peaks_w_control (MACS2/PeakDetect.c:3344) File "CallPeakUnit.pyx", line 1431, in MACS2.IO.CallPeakUnit.CallerFromAlignments.call_broadpeaks (MACS2/IO/CallPeakUnit.c:18287) File "CallPeakUnit.pyx", line 1488, in MACS2.IO.CallPeakUnit.CallerFromAlignments.call_broadpeaks (MACS2/IO/CallPeakUnit.c:17664) File "CallPeakUnit.pyx", line 1567, in MACS2.IO.CallPeakUnit.CallerFromAlignments.__chrom_call_broadpeak_using_certain_criteria (MACS2/IO/CallPeakUnit.c:18413) File "CallPeakUnit.pyx", line 475, in MACS2.IO.CallPeakUnit.CallerFromAlignments.__pileup_treat_ctrl_a_chromosome (MACS2/IO/CallPeakUnit.c:6465) ValueError: cannot resize this array: it does not own its data

taoliu commented 8 years ago

Haven't seen such error before. Do you mind provide me a minimum dataset to reproduce such error?

blikzen commented 8 years ago

Sure, we are having some issues with a server upgrade at the moment. I'll see if I can reproduce it, then forward whatever I have along.

blikzen commented 8 years ago

I think this issue is being caused by an memory access error. I'm loading a rather large genome index to shared memory for alignments, and this problem only seems to happen when I don't purge it or have enough additional swap prior to commencing with the next steps in the pipeline. So I'm thinking a heap is being assigned or written to by numpy that's still associated with the aligner's shared memory object, or something funky like that.

However, I get a new error now, still at peak calling, where the output is "slocal can't be smaller than d". As I understand it slocal is an estimation of local 1Kb signal bias, and d is fragment length. When I look at the fragment length estimation calculated by macs2, it's something ridiculous like 12Kb, even when I manually stipulate the shift and extsize values. I've also tried playing around with the -llocal option as well with similar results. I think it may have to do with the bam files I'm using, so I'm testing a few different parameters on a new set of data which previous analyses indicate are good. So I'll let you know when that finishes. If you have any suggestions in the mean time, specifically on how to troubleshoot this slocal error, I would be very grateful.

Thanks, Marty

blikzen commented 8 years ago

Ok, so an analysis with identical parameters as before but with the new bam files completed without error, so it has to be the bam file I was using previously. Whew! Thought I had broke macs2 for a second. :D

mpallocc commented 8 years ago

Dear Tao Liu,

we are experiencing the very same error, since we updated from 2014 to 2015 version. Current version that we are trying with is 2.1.1.20160309, with no luck.

We tried different versions of Numpy (1.9, 1.10), with no luck.

The PYTHON experts at our facility think that this line may be causing some trouble, but I am not fully sure about that (I am no numeric python expert): https://github.com/taoliu/MACS/blame/master/MACS2/IO/CallPeakUnit.pyx#L475

Our analysis have been very limited for several weeks. I noticed that not every BAM causes this issue, and it is sort of nondeterministic (seems to happen easily on larger BAMs).

Here is a public data BAM to be tested: https://bioinformatics.cineca.it//cast/files/2/1926/5757//file1.align.bam

Thank you in advance for your help, Matteo

taoliu commented 8 years ago

@mpallocc Sorry, I just saw your comment! I will test it and let you know.

taoliu commented 8 years ago

@mpallocc I haven't seen errors while MACS2 calls peaks from the file1.align.bam your provided, with default parameters. I got 38K peaks and the header of xls file looks like:

# This file is generated by MACS version 2.1.1.20160309
# Command line: callpeak -t file1.align.bam
# ARGUMENTS LIST:
# name = NA
# format = AUTO
# ChIP-seq file = ['file1.align.bam']
# control file = None
# effective genome size = 2.70e+09
# band width = 300
# model fold = [5, 50]
# qvalue cutoff = 5.00e-02
# Larger dataset will be scaled towards smaller dataset.
# Range for calculating regional lambda is: 10000 bps
# Broad region calling is off
# Paired-End mode is off

# tag size is determined as 42 bps
# total tags in treatment: 34410327
# tags after filtering in treatment: 28451060
# maximum duplicate tags at the same position in treatment = 1
# Redundant rate in treatment: 0.17
# d = 178
# alternative fragment length(s) may be 178 bps

My guess is that, perhaps in some cases you included many non-standard chromosomes in your BAM file and if the data on certain chromosome such as chr19_gl000209_random is too few, MACS2 may have issue to handle it. I will try to fix this potential problem. But at the meantime, a quick fix for your data analysis is to filter out all non-standard chromosomes.

mpallocc commented 8 years ago

Dear @taoliu ,

thank you so much for your help. I am trying right now what you suggested. I was so convinced that it was a bug on the code side that I didn't check deepily the issues on my data, since it was a bit non-deterministic. I'll keep you updated on this.

taoliu commented 8 years ago

@mpallocc Hope the commit 24a1eab solve the issue

mpallocc commented 8 years ago

Dear Tao,

thank you! I will try to test this on the cluster as soon as possible. In the meantime, unfortunately removing the chrUn* from my sample didn't help :(

blikzen commented 8 years ago

for alignment can you try using a non-patched primary assembly then proceed with peak calling? e.g., ftp://ftp.ensembl.org/pub/release-84/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz

On 05/19/2016 11:33 AM, mpallocc wrote:

Dear Tao,

thank you! I will try to test this on the cluster as soon as possible. In the meantime, unfortunately removing the chrUn* from my sample didn't help :(

— You are receiving this because you modified the open/close state. Reply to this email directly or view it on GitHub https://github.com/taoliu/MACS/issues/122#issuecomment-220380389

blikzen commented 8 years ago

i believe the twobit hg assembly files from ucsc are also unpatched, if you prefer that.

On 05/19/2016 11:47 AM, Marty wrote:

for alignment can you try using a non-patched primary assembly then proceed with peak calling? e.g., ftp://ftp.ensembl.org/pub/release-84/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz

On 05/19/2016 11:33 AM, mpallocc wrote:

Dear Tao,

thank you! I will try to test this on the cluster as soon as possible. In the meantime, unfortunately removing the chrUn* from my sample didn't help :(

— You are receiving this because you modified the open/close state. Reply to this email directly or view it on GitHub https://github.com/taoliu/MACS/issues/122#issuecomment-220380389

taoliu commented 8 years ago

@mpallocc Have you tested the new code? If you have any question, please let me know. Since I can't reproduce the error you saw, I purely depend on users' feedback including you :)

mpallocc commented 8 years ago

Dear Tao,

I just finished my tests, thank you for your patience.

Finally, macs2 doesn't crash anymore! This is a massive relief for us :)

During the execution, during the peak calling step, a warning is raised:

_INFO @ Mon, 23 May 2016 21:50:53: #3 Call peaks for each chromosome... Exception ValueError: 'cannot resize this array: it does not own its data' in 'MACS2.IO.CallPeakUnit.clean_up_ndarray' ignored Exception ValueError: 'cannot resize this array: it does not own its data' in 'MACS2.IO.CallPeakUnit.clean_up_ndarray' ignored Exception ValueError: 'cannot resize this array: it does not own its data' in 'MACS2.IO.CallPeakUnit.clean_up_ndarray' ignored Exception ValueError: 'cannot resize this array: it does not own its data' in 'MACS2.IO.CallPeakUnit.clean_up_ndarray' ignored Exception ValueError: 'cannot resize this array: it does not own its data' in 'MACS2.IO.CallPeakUnit.clean_up_ndarray' ignored Exception ValueError: 'cannot resize this array: it does not own its data' in 'MACS2.IO.CallPeakUnit.clean_up_ndarray' ignored INFO @ Mon, 23 May 2016 22:00:13: #4 Write output xls file... file1.peaks_macs2.macs2peaks.xls

But the code does not crash and I compared the peak sets found from previous analyses, I have an overlap from 95% (Transcription Factor) to 100% (Histone Modifications), that is likely due to other MACS2 updates from the end of 2015.

I don't know whether we should worry or not about the numpy exception.. anyways, THANK YOU!

taoliu commented 8 years ago

@blikzen Thanks for your feedback! Good to know! It seems that I have to do more checks to eliminate such warnings... If you have a small bam file that I can reproduce such error/warning, please share with me.

taoliu commented 8 years ago

Reopen since user reported the 'owndata' warning still exists.

blikzen commented 8 years ago

@taoliu no problem, unfortunately i dont still have the bam file that was causing the issue. i can try to replicate it as i think it was either an issue with patched alignments or unsorted bam files.

mpallocc commented 8 years ago

Dear Tao and Bliz,

I think it's really hard to reproduce the problem, since with the very same bam files I had no problem with my own PC, while it crashed on the cluster on the previous versions. I think It should be related to multiple numpy processes/threads and memory usage on a several nodes. I still think it's not input-specific related, that's why I wrote "kind of non-deterministic".

lajoimat commented 8 years ago

@taoliu I am experiencing the same errors with MACS2/2.1.1.20160309 and it does not depends on the input bam files, as different samples will crash at each run on the HPC cluster (python/2.7.11, numpy/1.6.2). As mpalloc noted, the problem seems related to ressource sharing, since it occurs only when multiple MACS2 processes run on the same node. I've switched back to MACS2/2.1.0.20140616 and this error does not happen.

mforde84 commented 8 years ago

Figured it out. The issue is caused by root file permissions on the default temp directory (/tmp/) specified in MACS2. You can either mess around with the ownership and permissions for the root temp directory, or preferably set the temp directory to a path owned by the user launching the MACS2 process using the --tempdir [path] command line option. Just make sure that you're resulting temp location has enough space...

I'm blikzen by the way.

iromeo commented 6 years ago

@lajoimat, @taoliu I also get this error on some files with MACS2/2.1.1.20160309 on HPC cluster ( python 2.7.13, numpy 1.12.1). Haven't tried MACS2/2.1.0.20140616 yet. It cannot be related to tmpdir, I force custom tmp dir using --tempdir option

macs2 callpeak --tempdir /tmp/3444241.mgt2.cluster -t GSM1057025_hg19.bam -c GSM1057030_hg19_input.bam -f BAM -g hs -n GSM1057025_hg19_q0.1 -q 0.1 -B

Update: @taoliu Library built from current master works ok.