luntergroup / octopus

Bayesian haplotype-based mutation calling
MIT License
302 stars 38 forks source link

"terminate called without an active exception" error #92

Closed vlrieg closed 4 years ago

vlrieg commented 4 years ago

Describe the bug I am trying to run octopus on the cluster on a single chromosome and get the following output and error consistently at ~10% of the way through the analysis:

[2019-11-14 13:13:53] <INFO> octopus v0.7.0 (develop 3acdff8b)
[2019-11-14 13:13:53] <INFO> Copyright (c) 2015-2019 University of Oxford
[2019-11-14 13:13:53] <INFO> ------------------------------------------------------------------------
[2019-11-14 13:13:56] <WARN> The polyclone calling model is still in development. Do not use for production work!
[2019-11-14 13:13:56] <INFO> Done initialising calling components in 2s
[2019-11-14 13:13:56] <INFO> Detected 1 sample: "ERR775189"
[2019-11-14 13:13:56] <INFO> Invoked calling model: polyclone
[2019-11-14 13:13:56] <INFO> Processing 1,021,663bp with a single thread (80 cores detected)
[2019-11-14 13:13:56] <INFO> Writing filtered calls to "/gpfs/fs1/data/wraycompute/val/octopus-trial/octopus-ERR775189.vcf.gz"
[2019-11-14 13:13:56] <INFO> ------------------------------------------------------------------------
[2019-11-14 13:13:56] <INFO>      current      |                   |     time      |     estimated
[2019-11-14 13:13:56] <INFO>      position     |     completed     |     taken     |     ttc
[2019-11-14 13:13:56] <INFO> ------------------------------------------------------------------------
[2019-11-14 13:15:39] <INFO>  LT635612:10419             1.0%           1m 42s            2h 49m
[2019-11-14 13:19:29] <INFO>  LT635612:20652             2.0%           5m 32s            6h 15m
[2019-11-14 13:22:12] <INFO>  LT635612:30900             3.0%           8m 16s            5h 20m
[2019-11-14 13:22:14] <INFO>  LT635612:32477             3.2%           8m 18s             5h 7m
[2019-11-14 13:22:19] <INFO>  LT635612:33814             3.3%           8m 23s            4h 56m
[2019-11-14 13:22:19] <INFO>  LT635612:35077             3.4%           8m 23s            4h 43m
[2019-11-14 13:22:20] <INFO>  LT635612:37845             3.7%           8m 24s            4h 21m
[2019-11-14 13:22:23] <INFO>  LT635612:39584             3.9%           8m 27s            4h 13m
[2019-11-14 13:22:25] <INFO>  LT635612:40660             4.0%           8m 29s             4h 5m
[2019-11-14 13:22:38] <INFO>  LT635612:41682             4.1%           8m 42s             4h 3m
[2019-11-14 13:24:56] <INFO>  LT635612:52109             5.1%          10m 59s            3h 55m
[2019-11-14 13:26:12] <INFO>  LT635612:62960             6.2%          12m 16s            3h 29m
[2019-11-14 13:28:10] <INFO>  LT635612:73191             7.2%          14m 13s            3h 22m
[2019-11-14 13:30:15] <INFO>  LT635612:83409             8.2%          16m 19s            3h 19m
[2019-11-14 13:32:28] <INFO>  LT635612:93709             9.2%          18m 31s            3h 17m
[2019-11-14 13:33:01] <INFO> LT635612:104507            10.2%           19m 5s            2h 59m
terminate called without an active exception
/var/spool/slurmd/job12649585/slurm_script: line 12: 17910 Aborted                 (core dumped) octopus -R ${ref} -I ${file} -t chrom1.txt -C polyclone -o octopus-${accession}.vcf.gz --target-working-memory 50GB

Command Command line to run octopus:

$ octopus -R ${ref} -I ${file} -t chrom1.txt -C polyclone -o octopus-${accession}.vcf.gz --target-working-memory 50GB

Desktop (please complete the following information):

Additional context I tried running with different amounts of memory (up to 100GB) and specifying the number of cores (e.g. sbatch --mem=50G -c32 ./run-octopus.sh) in case that was the problem, but the "terminate called without an active exception" error makes me think it might be a bug? I have only tried running a single sample so far.

dancooke commented 4 years ago

Hi Valerie, thanks for the bug report. Could you please try re-building Octopus with sanitizers and re-running with the sanitized binary? If you're using the Python installer, then you can simply add --sanitize to the options and use the generated binary octopus-sanitize (rather than octopus). For example:

$ octopus/scripts/install.py --install-dependencies --sanitize
$ octopus/bin/octopus-sanitize -R ${ref} -I ${file} -t chrom1.txt -C polyclone -o octopus-${accession}.vcf.gz --target-working-memory 50GB

If you could post whatever output is produced then that would be very helpful. Note that the sanitized binary will run slower than the normal binary.

vlrieg commented 4 years ago

Hi Dan,

I did as you suggested but the output doesn't look very different:

[2019-11-15 14:21:18] <INFO> ------------------------------------------------------------------------
[2019-11-15 14:21:18] <INFO> octopus v0.7.0 (develop 3acdff8b)
[2019-11-15 14:21:18] <INFO> Copyright (c) 2015-2019 University of Oxford
[2019-11-15 14:21:18] <INFO> ------------------------------------------------------------------------
[2019-11-15 14:21:22] <WARN> The polyclone calling model is still in development. Do not use for production work!
[2019-11-15 14:21:22] <INFO> Done initialising calling components in 4s
[2019-11-15 14:21:22] <INFO> Detected 1 sample: "ERR775189"
[2019-11-15 14:21:23] <INFO> Invoked calling model: polyclone
[2019-11-15 14:21:23] <INFO> Processing 1,021,663bp with a single thread (56 cores detected)
[2019-11-15 14:21:23] <INFO> Writing filtered calls to "/gpfs/fs1/data/wraycompute/val/octopus-trial/octopus-ERR775189.vcf.gz"
[2019-11-15 14:21:23] <INFO> ------------------------------------------------------------------------
[2019-11-15 14:21:23] <INFO>      current      |                   |     time      |     estimated   
[2019-11-15 14:21:23] <INFO>      position     |     completed     |     taken     |     ttc         
[2019-11-15 14:21:23] <INFO> ------------------------------------------------------------------------
[2019-11-15 14:28:21] <INFO>  LT635612:10419             1.0%           6m 58s           11h 30m
[2019-11-15 14:47:55] <INFO>  LT635612:20652             2.0%          26m 31s             1d 7h
[2019-11-15 15:01:34] <INFO>  LT635612:30900             3.0%          40m 11s             1d 3h
[2019-11-15 15:01:44] <INFO>  LT635612:32477             3.2%          40m 21s             1d 1h
[2019-11-15 15:02:07] <INFO>  LT635612:33814             3.3%          40m 44s           24h 59m
[2019-11-15 15:02:07] <INFO>  LT635612:35077             3.4%          40m 44s           23h 55m
[2019-11-15 15:02:12] <INFO>  LT635612:37845             3.7%          40m 48s            22h 4m
[2019-11-15 15:02:26] <INFO>  LT635612:39584             3.9%           41m 3s           21h 22m
[2019-11-15 15:02:36] <INFO>  LT635612:40660             4.0%          41m 13s           20h 41m
[2019-11-15 15:03:40] <INFO>  LT635612:41682             4.1%          42m 17s           20h 32m
[2019-11-15 15:04:55] <INFO>  LT635612:42712             4.2%          43m 32s           20h 29m
[2019-11-15 15:06:46] <INFO>  LT635612:43801             4.3%          45m 22s           20h 45m
[2019-11-15 15:06:46] <INFO>  LT635612:44979             4.4%          45m 23s            20h 4m
[2019-11-15 15:08:05] <INFO>  LT635612:46032             4.5%          46m 41s            20h 4m
[2019-11-15 15:18:43] <INFO>  LT635612:56250             5.5%          57m 20s            19h 8m
[2019-11-15 15:25:06] <INFO>  LT635612:66489             6.5%            1h 3m           17h 14m
[2019-11-15 15:25:48] <INFO>  LT635612:67638             6.6%            1h 4m            17h 6m
[2019-11-15 15:26:23] <INFO>  LT635612:68762             6.7%            1h 4m           16h 56m
[2019-11-15 15:27:11] <INFO>  LT635612:69825             6.8%            1h 5m           16h 50m
[2019-11-15 15:27:31] <INFO>  LT635612:71134             7.0%            1h 6m           16h 37m
[2019-11-15 15:28:50] <INFO>  LT635612:72160             7.1%            1h 7m           16h 40m
[2019-11-15 15:30:28] <INFO>  LT635612:73191             7.2%            1h 9m           16h 47m
[2019-11-15 15:31:38] <INFO>  LT635612:74249             7.3%           1h 10m           16h 47m
[2019-11-15 15:32:11] <INFO>  LT635612:75350             7.4%           1h 10m           16h 38m
[2019-11-15 15:33:41] <INFO>  LT635612:76416             7.5%           1h 12m           16h 43m
[2019-11-15 15:34:59] <INFO>  LT635612:77439             7.6%           1h 13m           16h 45m
[2019-11-15 15:36:16] <INFO>  LT635612:78472             7.7%           1h 14m           16h 47m
[2019-11-15 15:43:29] <INFO>  LT635612:88981             8.7%           1h 22m           15h 49m
[2019-11-15 15:44:26] <INFO>  LT635612:90012             8.8%           1h 23m           15h 47m
[2019-11-15 15:45:08] <INFO>  LT635612:91062             8.9%           1h 23m           15h 42m
[2019-11-15 15:46:16] <INFO>  LT635612:92271             9.0%           1h 24m           15h 42m
[2019-11-15 15:48:08] <INFO>  LT635612:93361             9.1%           1h 26m           15h 51m
[2019-11-15 15:49:32] <INFO>  LT635612:94383             9.2%           1h 28m           15h 54m
[2019-11-15 15:50:23] <INFO>  LT635612:95417             9.3%           1h 29m           15h 51m
[2019-11-15 15:50:51] <INFO>  LT635612:96642             9.5%           1h 29m           15h 43m
[2019-11-15 15:51:37] <INFO>  LT635612:98476             9.6%           1h 30m           15h 38m
[2019-11-15 15:51:37] <INFO>  LT635612:99801             9.8%           1h 30m           15h 26m
[2019-11-15 15:51:37] <INFO> LT635612:101836            10.0%           1h 30m           15h 12m
[2019-11-15 15:51:37] <INFO> LT635612:103437            10.1%           1h 30m               15h
terminate called without an active exception
/var/spool/slurmd/job12649983/slurm_script: line 12: 32209 Aborted                 (core dumped) octopus-sanitize -R ${ref} -I ${file} -t chrom1.txt -C polyclone -o octopus-${accession}.vcf.gz --target-working-memory 50GB

I'm not sure if it's relevant, but I did have to remove the -Werror tag from two or three CMakeLists.txt files to get it to compile originally.

dancooke commented 4 years ago

Okay thanks. In that case would it be possible for you to send me the BAM file? If you can reproduce the issue on a subregion then a subBAM would be fine. I'd recommend trying to call a sub-region around the last log point until the problem is reproduced, for example, maybe start with

$ octopus -R ${ref} -I ${file} -t chrom1.txt -C polyclone -o octopus-${accession}.vcf.gz --target-working-memory 50GB -T LT635612:103000-106000

and widen the region until the problem appears. Thanks.

vlrieg commented 4 years ago

I just sent you an email with a link to download this BAM file.

I also tried running a couple other samples that had been processed similarly but not exactly the same way as that first one (duplicates removed vs duplicates removed & realigned around indels). These ones also quit with the same output but at different points along the chromosome (one at ~11%, another at ~20%). I can send along those others if it would be helpful but they're pretty big so I'll hold off for now.

dancooke commented 4 years ago

Perfect, thanks. The bug should be fixed in 8994904a128e7c2eb2069821e7407d220c8c5e6f.

BTW, there's no need to preprocess BAM files for Octopus.

vlrieg commented 4 years ago

Thank you Dan! This run just finished successfully! :)

BTW, there's no need to preprocess BAM files for Octopus.

We had already preprocessed our BAMs with the expectation of using the GATK pipeline for variant calling. Since the Octopus documentation said already-processed BAMs should still work okay, I decided to get things up and running with Octopus using the files I had on hand.