luntergroup / octopus

Bayesian haplotype-based mutation calling
MIT License
299 stars 37 forks source link

Error during trio variant calling and a question about --very-fast and --fast #202

Closed rabiafidan closed 2 years ago

rabiafidan commented 2 years ago

Hi,

I am running Octopus on a trio. When I call variants from autosomes or X or Y separately, it works fine in all 3 cases. But when I call variants from autosomes+X, I get the following error right after call set refinement filtering starts:

2021-07-27 23:38:07] <INFO>  chrX:151316968            99.9%            6h 5m                 -
[2021-07-27 23:38:20] <INFO>               -             100%            6h 5m                 -
[2021-07-27 23:40:28] <INFO> Starting Call Set Refinement (CSR) filtering
[2021-07-27 23:40:28] <INFO> Removed 48 temporary files
[2021-07-27 23:40:28] <EROR> A program error has occurred:
[2021-07-27 23:40:28] <EROR> 
[2021-07-27 23:40:28] <EROR>     Encountered an exception during calling 'VCF file
[2021-07-27 23:40:28] <EROR>     /foo/octopus-temp-8/trio1_autosomes_X_vfast.unfiltered.vcf
[2021-07-27 23:40:28] <EROR>     is too big'. This means there is a bug and your results are
[2021-07-27 23:40:28] <EROR>     untrustworthy.
[2021-07-27 23:40:28] <EROR> 
[2021-07-27 23:40:28] <EROR> To help resolve this error run in debug mode and send the log file to
[2021-07-27 23:40:28] <EROR> https://github.com/luntergroup/octopus/issues.
[2021-07-27 23:40:28] <INFO> ------------------------------------------------------------------------

Also dropping --very-fast didn't change the error. I have ran in debug mode as suggested and I can send you the file.

Version

octopus version 0.7.4
Target: x86_64 Linux 5.4.0-72-generic
SIMD extension: AVX2
Compiler: GNU 9.3.0
Boost: 1_74

Command to install Octopus

mamba create -c conda-forge -c bioconda -n envFoo snakemake octopus

Command to run Octopus

octopus -R GRCh38_full_analysis_set_plus_decoy_hla.fa -I crams/HG00405.final.cram crams/HG00403.final.cram crams/HG00404.final.cram -M HG00404 -F HG00403 -o octopus_calls/trio1_autosomes_X_vfast.vcf --very-fast --sequence-error-model PCR-FREE.NOVASEQ --read-linkage PAIRED --threads 95 -T chr1 to chrX 

Reference I used

wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa

Cram files

wget ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR324/ERR3241666/HG00404.final.cram
wget ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR398/ERR3988761/HG00405.final.cram
wget ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR324/ERR3241665/HG00403.final.cram

Question

I have looked at the documentation but failed to find any information on what changes do --very-fast and --fast options make. Could you elaborate on that please or point me to where I can find that information if I missed? We are specifically interested in changes in the local reassembly step. We don't want to compromise this step but we would very much like to decrease the run time if it safe.

Thank you so much!

dancooke commented 2 years ago

This is a duplicate of #177 - if you can't install from source to get the fixed version then you can avoid the error by writing compressed VCF output (e.g. -o octopus_calls/trio1_autosomes_X_vfast.vcf.gz).

--very-fast is equivalent to --fast with the addition of the --disable-inactive-flank-scoring option. Note that both --fast and --very-fast disable local reassembly. I'll update the docs to make this clearer when I get a moment.

rabiafidan commented 2 years ago

Thanks! And sorry for the duplicate.