luntergroup / octopus

Bayesian haplotype-based mutation calling
MIT License
305 stars 38 forks source link

Segmentation fault when reference genome contains '.' #238

Open bricoletc opened 2 years ago

bricoletc commented 2 years ago

Describe the bug Thanks for the great tool! If the reference fasta contains one or more '.', instead of a nucleotide, it fails with a segfault. I successfully run octopus on my fasta + bam when '.' is replaced by a valid base, e.g. 'a'.

If these shouldn't be supported, it would be useful to flag this issue when it occurs, rather than just segfaulting. Else, could also refuse to call/report any variants at positions with '.' in the ref.

Can provide the fasta and BAM I've used to identify this if you'd like.

$ octopus --version
octopus version 0.7.4
Target: x86_64 Linux 5.4.0-72-generic
SIMD extension: AVX2
Compiler: GNU 9.3.0
Boost: 1_74

Command line to run octopus:

$ octopus -I induced_ref_mapped.bam -R induced_ref.fa --organism-ploidy 1 --threads 4
dancooke commented 2 years ago

Thanks for the bug report. Though there is no official FASTA/Q specification, I don't believe a . would be considered a valid base by any well-established tool. However, I agree Octopus should be validating its input better - I'll look to report a more helpful error message.