luntergroup / octopus

Bayesian haplotype-based mutation calling
MIT License
302 stars 38 forks source link

Running Octopus within memory-controlled environment results in Std::bad_alloc with code base after commit 4d4b0f4 #86

Closed ahstram closed 4 years ago

ahstram commented 5 years ago

Hi there,

I am running Octopus with a Sun Grid Engine (SGE) compute environment, where I am required to set limits on how much memory will be allocated beforehand. I have told SGE to allow up to 100G of memory per Octopus job, but have been getting the following error message when running either commit 0446f85 or 783b728, with the following Octopus command:

$ time /path/to/octopus/0.6.3-beta+dev/bin/octopus --threads 8 -R /path/to/hs37d5.fa -I normal.bam plasma.bam tumor.bam --bamout /path/to/work --bamout-type FULL --normal-sample normal_sample_name -t region.bed --working-directory /path/to/job/directory --annotations forest --disable-call-filtering -o /path/to/out.vcf

[2019-10-15 10:24:52] ------------------------------------------------------------------------ [2019-10-15 10:24:52] octopus v0.6.3-beta (develop ) [2019-10-15 10:24:52] Copyright (c) 2015-2019 University of Oxford [2019-10-15 10:24:52] ------------------------------------------------------------------------ [2019-10-15 10:24:59] An unclassified error has occurred: [2019-10-15 10:24:59] [2019-10-15 10:24:59] Std::bad_alloc. [2019-10-15 10:24:59] [2019-10-15 10:24:59] To help resolve this error submit an error report. [2019-10-15 10:24:59] ------------------------------------------------------------------------ 6.44user 1.30system 0:07.87elapsed 98%CPU (0avgtext+0avgdata 1538684maxresident)k 36920inputs+8outputs (27major+388663minor)pagefaults 0swaps

The above command runs fine with commit 4d4b0f4.

I can run the above command with commits 0446f85 and 783b728 outside of the SGE environment, but that is not practical for running more than one region at a time.

I suspect there is a very large memory allocation resulting in the "bad alloc" messages that was introduced recently, which is not playing nicely with SGE.

I've attached a debug.log file generated by adding "--debug" to the affected command. The --trace log was not informative in this case.

debug.log

Can you please advise?

Thanks,

Alex

dancooke commented 5 years ago

This should be fixed in 2d21a08b26c3f491f606538d556dcc01d126aae0. However, I'm heavily committing to the develop branch at the moment testing various changes. I would recommend rolling back to 6f1869dc79d45898ac96dedc9b730a804427162e for the time being - this worked pretty well in my tests.

ahstram commented 5 years ago

@dancooke got it, I've rolled back to 6f1869d.

Any reason why the bug would occur in 783b728 if fixed in 2d21a08? It looks like 783b728 is a descendent of 2d21a08... I just double checked that the issue indeed happens with 783b728.

dancooke commented 5 years ago

ah ok, must be another issue - I was getting a similar bug before that I fixed in 2d21a08. I'll have another look.

dancooke commented 4 years ago

I think this issue should be resolved (as of e8f0a9bc2abd56421d218c298a6c69d9aecc8e98) - please re-open if not.

ahstram commented 4 years ago

Great! I'll give it a test shortly. Should I deploy e8f0a9b, or is there a more recent revision, such as 3e14470, which has done well in your tests & you would recommend?