luntergroup / octopus

Bayesian haplotype-based mutation calling
MIT License
301 stars 37 forks source link

Cancer model core dumps with std::out_of_range error for _Map_base::at #178

Closed jbedo closed 2 years ago

jbedo commented 3 years ago

Describe the bug Cancer model core dumps with std::out_of_range error for _Map_base::at. Unfortunately I can't share this data; any suggestions on capturing debug info that might help you track this down?

[2021-05-12 08:05:11] <INFO> ------------------------------------------------------------------------
[2021-05-12 08:05:11] <INFO> octopus v0.7.4
[2021-05-12 08:05:11] <INFO> Copyright (c) 2015-2021 University of Oxford
[2021-05-12 08:05:11] <INFO> ------------------------------------------------------------------------
[2021-05-12 08:09:05] <INFO> Done initialising calling components in 3m 54s
[2021-05-12 08:09:06] <INFO> Detected 3 samples: "blood" "l-ovary" "r-ovary"
[2021-05-12 08:09:06] <INFO> Invoked calling model: cancer
[2021-05-12 08:09:06] <INFO> Processing 181,538,259bp with 24 threads (56 cores detected)
[2021-05-12 08:09:06] <INFO> Writing filtered calls to "/nix/store/y4w5281d3hqcz4j1rw64ngxjlcav6sv6-bionix-octopus-callSomatic"
[2021-05-12 08:10:10] <INFO> ------------------------------------------------------------------------
[2021-05-12 08:10:10] <INFO>      current      |                   |     time      |     estimated
[2021-05-12 08:10:10] <INFO>      position     |     completed     |     taken     |     ttc
[2021-05-12 08:10:10] <INFO> ------------------------------------------------------------------------
[2021-05-12 08:11:11] <INFO>    chr5:3509156             0.5%            1m 1s            3h 22m
[2021-05-12 08:11:43] <INFO>     chr5:746001             1.0%           1m 33s            2h 33m
[2021-05-12 08:12:20] <INFO>     chr5:243580             1.5%           2m 10s            2h 22m
[2021-05-12 08:12:57] <INFO>    chr5:3810806             2.0%           2m 47s            2h 16m
[2021-05-12 08:13:33] <INFO>    chr5:5916862             2.5%           3m 23s            2h 12m
[2021-05-12 08:14:11] <INFO>    chr5:6743310             3.0%            4m 1s             2h 9m
[2021-05-12 08:14:47] <INFO>    chr5:6232278             3.5%           4m 37s             2h 7m
[2021-05-12 08:15:19] <INFO>    chr5:8144926             4.0%            5m 9s             2h 3m
[2021-05-12 08:15:51] <INFO>    chr5:7229330             4.5%           5m 41s                2h
[2021-05-12 08:16:20] <INFO>    chr5:7443333             5.0%           6m 10s            1h 57m
[2021-05-12 08:16:41] <EROR> Encountered a problem whilst calling chr5:7299733-7496937()
[2021-05-12 08:16:43] <EROR> An unclassified error has occurred:
[2021-05-12 08:16:43] <EROR>
[2021-05-12 08:16:43] <EROR>     _Map_base::at.
[2021-05-12 08:16:43] <EROR>
[2021-05-12 08:16:43] <EROR> To help resolve this error submit an error report.
[2021-05-12 08:16:43] <FATL> Encountered error in task writer thread. Calling terminate
terminate called after throwing an instance of 'std::out_of_range'
  what():  _Map_base::at
/nix/store/13i375llk80lry9g29szivsz9jap7d3p-stdenv-linux/setup: line 1290: 29068 Aborted                 (core dumped) octopus -R ref.fa -I *.bam -o $out --bamout $evidence --threads=$NIX_BUILD_CORES --fast --max-genotypes 1000 -T chr5 -N $normal --annotations AF

Version

$ octopus --version
octopus version 0.7.4
Target: x86_64 Linux 3.10.0-957.5.1.el7.x86_64
SIMD extension: AVX2
Compiler: GNU 9.3.0
Boost: 1_69

Command Command line to install octopus: Using Nix package.

Command line to run octopus:

$ octopus -R ref.fa -I *.bam -o $out --bamout $evidence --threads=$NIX_BUILD_CORES --fast --max-genotypes 1000 -T chr5 -N $normal --annotations AF

Additional context Add any other context about the problem here, e.g.

dancooke commented 3 years ago

Thanks for the bug report. Could you try running on the problem region with a single thread and generate a debug log:

$ octopus -R ref.fa -I *.bam -o $out --bamout $evidence --fast --max-genotypes 1000 -T chr5:7299733-7496937 -N $normal --annotations AF --debug
jbedo commented 3 years ago

Apologies for the long delay, we had a storage failure and I didn't have compute time to spare to track this down until now.

I'm actually having trouble reproducing this exact error. I am however consistently generating segfaults that look like this:

builder for '/nix/store/5bykn0gi4dyncpg9dpnn81far91447jj-bionix-octopus-callSomatic.drv' failed with exit code 139; last 10 log lines:
  [2021-06-15 22:46:16] <INFO>      position     |     completed     |     taken     |     ttc
  [2021-06-15 22:46:16] <INFO> ------------------------------------------------------------------------
  [2021-06-15 22:47:16] <INFO>     chr9:550055             0.5%              59s            3h 17m
  [2021-06-15 22:47:54] <INFO>    chr9:3065110             1.0%           1m 37s            2h 41m
  [2021-06-15 22:48:32] <INFO>    chr9:1108165             1.5%           2m 15s            2h 28m
  [2021-06-15 22:49:16] <INFO>    chr9:1018931             2.0%           2m 59s            2h 26m
  [2021-06-15 22:49:59] <INFO>    chr9:3730990             2.5%           3m 42s            2h 24m
  /nix/store/13i375llk80lry9g29szivsz9jap7d3p-stdenv-linux/setup: line 1290: 39190 Segmentation fault      (core dumped) octopus -R ref.fa -I *.bam -o $out --bamout $evidence --threads=$NIX_BUILD_CORES --fast --max-genotypes 1000 -T chr9 -N $normal --annotations AF
  srun: error: milton-med-008: task 0: Exited with exit code 139
  salloc: Relinquishing job allocation 1613185
builder for '/nix/store/8kb9sh9j67vbiikdj66pdfzqaw6p1npk-bionix-octopus-callSomatic.drv' failed with exit code 134; last 10 log lines:
  [2021-06-15 22:56:05] <INFO>   chr12:2170464             1.5%            2m 3s            2h 15m
  [2021-06-15 22:56:43] <INFO>    chr12:961733             2.0%           2m 41s            2h 11m
  [2021-06-15 22:57:27] <INFO>   chr12:1144233             2.5%           3m 25s            2h 13m
  [2021-06-15 22:58:14] <INFO>   chr12:4503738             3.0%           4m 12s            2h 16m
  [2021-06-15 22:59:00] <INFO>   chr12:5467059             3.5%           4m 58s            2h 16m
  [2021-06-15 22:59:49] <INFO>   chr12:6163683             4.0%           5m 47s            2h 18m
  free(): invalid next size (fast)
  /nix/store/13i375llk80lry9g29szivsz9jap7d3p-stdenv-linux/setup: line 1290: 42798 Aborted                 (core dumped) octopus -R ref.fa -I *.bam -o $out --bamout $evidence --threads=$NIX_BUILD_CORES --fast --max-genotypes 1000 -T chr12 -N $normal --annotations AF
  srun: error: milton-med-008: task 0: Exited with exit code 134
  salloc: Relinquishing job allocation 1613194
builder for '/nix/store/2b8i8879qbq54s7cp0w1an0f82hf2nby-bionix-octopus-callSomatic.drv' failed with exit code 134; last 10 log lines:
  [2021-06-15 23:27:12] <INFO>  chr17:79069719            95.1%          42m 46s            2m 12s
  [2021-06-15 23:27:53] <INFO>  chr17:80051379            96.1%          43m 27s            1m 45s
  [2021-06-15 23:28:40] <INFO>  chr17:79823782            97.1%          44m 13s            1m 19s
  [2021-06-15 23:29:27] <INFO>  chr17:80672915            98.1%           45m 1s               52s
  [2021-06-15 23:30:17] <INFO>  chr17:80905142            99.1%          45m 51s               25s
  [2021-06-15 23:33:32] <EROR> Encountered a problem whilst calling chr17:82999972-83163929()
  free(): invalid next size (fast)
  /nix/store/13i375llk80lry9g29szivsz9jap7d3p-stdenv-linux/setup: line 1290: 13672 Aborted                 (core dumped) octopus -R ref.fa -I *.bam -o $out --bamout $evidence --threads=$NIX_BUILD_CORES --fast --max-genotypes 1000 -T chr17 -N $normal --annotations AF
  srun: error: milton-sml-018: task 0: Exited with exit code 134
  salloc: Relinquishing job allocation 1613181
builder for '/nix/store/fkcpbx9wk6hw4izfkszlwv8sp8npggs0-bionix-octopus-callSomatic.drv' failed with exit code 134; last 10 log lines:
  [2021-06-16 02:15:39] <INFO>   chr6:44223934            26.5%          27m 52s            1h 17m
  [2021-06-16 02:16:14] <INFO>   chr6:47662002            27.0%          28m 26s            1h 16m
  [2021-06-16 02:16:42] <INFO>   chr6:46919042            27.5%          28m 55s            1h 16m
  [2021-06-16 02:17:04] <INFO>   chr6:46970072            28.0%          29m 17s            1h 15m
  [2021-06-16 02:17:30] <INFO>   chr6:48154109            28.5%          29m 43s            1h 14m
  [2021-06-16 02:17:55] <INFO>   chr6:48211700            29.0%           30m 7s            1h 13m
  free(): invalid next size (fast)
  /nix/store/13i375llk80lry9g29szivsz9jap7d3p-stdenv-linux/setup: line 1290: 14614 Aborted                 (core dumped) octopus -R ref.fa -I *.bam -o $out --bamout $evidence --threads=$NIX_BUILD_CORES --fast --max-genotypes 1000 -T chr6 -N $normal --annotations AF

I'm currently trying to generate a debug log for you that captures one of these, but since it appears a bit random rather than triggered by a specific region it's taking a while. Other than the debug log, would anything else help?

jbedo commented 3 years ago

Here's some logs. stderr.log.gz octopus_debug_100000.log.gz

I had to limit the debug log to the last 100k lines as it was too large. If you need the whole log let me know.

jbedo commented 2 years ago

This is probably #228.