luntergroup / octopus

Bayesian haplotype-based mutation calling
MIT License
302 stars 38 forks source link

An unclassified error has occurred: Map::at #99

Closed DiDeoxy closed 4 years ago

DiDeoxy commented 4 years ago

Describe the bug I am trying to joint genotype two files for a test (they are both ~1x). I get the following output:

[2019-12-11 16:30:10] <INFO> ------------------------------------------------------------------------
[2019-12-11 16:30:10] <INFO> octopus v0.6.3-beta (release/0.6.3-beta 8b0e2968)
[2019-12-11 16:30:10] <INFO> Copyright (c) 2015-2019 University of Oxford
[2019-12-11 16:30:10] <INFO> ------------------------------------------------------------------------
[2019-12-11 16:30:12] <EROR> An unclassified error has occurred:
[2019-12-11 16:30:12] <EROR> 
[2019-12-11 16:30:12] <EROR>     Map::at.
[2019-12-11 16:30:12] <EROR> 
[2019-12-11 16:30:12] <EROR> To help resolve this error submit an error report.
[2019-12-11 16:30:12] <INFO> ------------------------------------------------------------------------

Version

$ octopus --version
octopus version 0.6.3-beta (release/0.6.3-beta 8b0e2968)
Target: x86_64 Linux 5.0.0-1023-azure
Compiler: GNU 8.3.0
Boost: 1_71

Command Command line to run octopus:

#!/bin/bash -ue
octopus \
  -R Homo_sapiens_assembly38.fasta \
  -I 0_NA12878.bam 1_NA12878.bam \
  -c 0_NA12878.bcf 1_NA12878.bcf \
  -o cohort.bcf \
  -g off \
  -a off \
  --repeat-candidate-generator off \
  --sequence-error-model PCR-FREE.NOVASEQ \
  --forest ~/octopus/resources/forests/germline.v0.6.3-beta.forest \
  --threads 10 \
  --debug
dancooke commented 4 years ago

Are you able to try the latest develop branch version? Note that some options have change so your command will become

octopus \
  -R Homo_sapiens_assembly38.fasta \
  -I 0_NA12878.bam 1_NA12878.bam \
  -c 0_NA12878.bcf 1_NA12878.bcf \
  -o cohort.bcf \
  --disable-denovo-variant-discovery \
  --sequence-error-model PCR-FREE.NOVASEQ \
  --threads 10 \
  --debug

I've removed the --forest option since the v0.6.3 forest is not compatible with the develop branch version.

DiDeoxy commented 4 years ago

I will give it a try

DiDeoxy commented 4 years ago

I can't download the forests for the develop branch when I build. So I can't run the develop version of Octopus using docker and nextflow.

dancooke commented 4 years ago

How are you building? The forest availability shouldn't affect building. I realised after my first response that the forest you have won't be compatible with the develop branch version (see edited post), but you can still run your command without the forest to see if the bug you got previously has been resolved already.

DiDeoxy commented 4 years ago

Okay, will do so.

Build command:

FROM maxh/base_conda:0.1

RUN sudo apt-get update \
    && sudo apt-get install -y build-essential

RUN git clone -b develop https://github.com/luntergroup/octopus.git \
    && octopus/scripts/install.py \
      --install-dependencies \
      --download-forests

ENV PATH=/home/$USERNAME/octopus/bin:$PATH
dancooke commented 4 years ago

Ah ok, so just remove the --download-forests option from the install.py command and it should work.

DiDeoxy commented 4 years ago

It builds, I just wanted to run with the forest. It'll take about an hour to reprocess the data with the changed command. I will let you know what happens.

DBS-Max commented 4 years ago

Hey, so, first of all, it Octopus takes 5x as long to run and reads 10x as much data on the develop branch without RF compared to 0.6.3-beta with RF. The joint genotype stage for two samples is still running (no crash so far) after 13hrs.

Commands:

octopus \
      -R Homo_sapiens_assembly38.fasta \
      -I 1_NA12878.bam \
      -t octopus_intervals \
      -o 1_NA12878.bcf \
      --sequence-error-model PCR-FREE.NOVASEQ \
      --threads 8
octopus \
      -R Homo_sapiens_assembly38.fasta \
      -I 0_NA12878.bam \
      -t octopus_intervals \
      -o 1_NA12878.bcf \
      --sequence-error-model PCR-FREE.NOVASEQ \
      --threads 8
octopus \
      -R Homo_sapiens_assembly38.fasta \
      -I 1_NA12878.bam 0_NA12878.bam \
      -c 1_NA12878.bcf 0_NA12878.bcf \
      -o cohort.bcf \
      --disable-denovo-variant-discovery \
      --sequence-error-model PCR-FREE.NOVASEQ \
      --threads 8 \
      --debug
DBS-Max commented 4 years ago

Correction, nextflow was lying to me, the join genotyping stage crashed after 2hrs? with the following error message:

WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
[2019-12-12 03:12:32] <INFO> ------------------------------------------------------------------------
[2019-12-12 03:12:32] <INFO> octopus v0.7.0 (develop 0aa9e574)
[2019-12-12 03:12:32] <INFO> Copyright (c) 2015-2019 University of Oxford
[2019-12-12 03:12:32] <INFO> ------------------------------------------------------------------------
[2019-12-12 03:14:28] <WARN> The population calling model is still in development. Do not use for production work!
[2019-12-12 03:14:28] <INFO> Done initialising calling components in 1m 55s
[2019-12-12 03:14:28] <INFO> Detected 2 samples: "0_NA12878" "1_NA12878"
[2019-12-12 03:14:28] <INFO> Invoked calling model: population
[2019-12-12 03:14:28] <INFO> Processing 3,217,346,917bp with 8 threads (32 cores detected)
[2019-12-12 03:14:28] <INFO> Writing filtered calls to "/data/nextflow_temp/fa/93dd715cc7d9e1f1bb9cc8479ced11/cohort.bcf"
[2019-12-12 03:14:30] <WARN> Running in parallel mode can make debug log difficult to interpret
[2019-12-12 05:32:13] <INFO> -------------------------------------------------------------------------------------
[2019-12-12 05:32:13] <INFO>            current             |                   |     time      |     estimated   
[2019-12-12 05:32:13] <INFO>            position            |     completed     |     taken     |     ttc         
[2019-12-12 05:32:13] <INFO> -------------------------------------------------------------------------------------
[2019-12-12 05:32:13] <EROR> Encountered a problem whilst calling chr1:0-65856340
[2019-12-12 05:32:13] <EROR> Encountered a problem whilst calling chr1:65856340-125179584
[2019-12-12 05:32:13] <EROR> Encountered a problem whilst calling chr1:125179584-200450439
[2019-12-12 05:32:13] <EROR> Encountered a problem whilst calling chr1:200450439-248956422
[2019-12-12 05:32:13] <EROR> Encountered a problem whilst calling chr2:0-67460839
[2019-12-12 05:32:13] <EROR> Encountered a problem whilst calling chr2:67460839-136232273
[2019-12-12 05:32:13] <EROR> Encountered a problem whilst calling chr2:136232273-205131746
[2019-12-12 05:32:13] <EROR> Encountered a problem whilst calling chr2:205131746-242193529
dancooke commented 4 years ago

Are you able to provide the BAM files you're using?

DBS-Max commented 4 years ago

Possibly, what's the best way to get them to you?

dancooke commented 4 years ago

If you give me your email I can give you access to a Dropbox folder. Send me an email to dcooke@well.ox.ac.uk if you don't want to post your email on here.

DBS-Max commented 4 years ago

The files were generated by sampling from a samtools fastq conversion samtools fastq -1 NA12878_1.fastq -2 NA12878_2.fastq - of the cram file here: ftp://ftp-trace.ncbi.nlm.nih.gov/1000genomes/ftp/1000G_2504_high_coverage/data/ERR3239334/

I didn't sort them out of genomic order before I sampled them. I am currently trying out sorting the cram and then extracting using samtools sort -n NA12878.final.cram |samtools fastq -1 NA12878_1.fastq -2 NA12878_2.fastq - and then sampling. I'll let you know how that goes.

dancooke commented 4 years ago

Can you upload the sampled BAM files rather than the fastqs? Presumably they shouldn't be large if they're just 1x.

DBS-Max commented 4 years ago

done.

dancooke commented 4 years ago

Closing as original problem fixed in develop. Please re-open new issue for the other problem if persists. I'm thinking that it may be a memory issue. I'm trying to reproduce and will open issue if I can.