lh3 / minimap2

A versatile pairwise aligner for genomic and spliced nucleotide sequences
https://lh3.github.io/minimap2
Other
1.79k stars 409 forks source link

Segmentation fault when setting a large gap penalty #79

Closed simon-l3 closed 6 years ago

simon-l3 commented 6 years ago

I was trying to locate relatively unique regions from two very close bacterial genomes and needed to set a divergence less than -x asm5. Then, I encountered a segmentation fault when I set gap open penalty to either -O60 -E4 or -O61 -E3

The commands that generated segmentation fault: ./minimap2 -c -A1 -B29 -O61 -E3 GCF_000195955.2_ASM19595v2_genomic.fna.gz GCF_000758245.1_ASM75824v1_genomic.fna.gz ./minimap2 -c -A1 -B29 -O60 -E4 GCF_000195955.2_ASM19595v2_genomic.fna.gz GCF_000758245.1_ASM75824v1_genomic.fna.gz

While this is ok: ./minimap2 -c -A1 -B29 -O60 -E3 GCF_000195955.2_ASM19595v2_genomic.fna.gz GCF_000758245.1_ASM75824v1_genomic.fna.gz

minimap2 version: 2.5 and 2.6

lh3 commented 6 years ago

This is because 1+(61+3)*2>127, which triggers integer overflow in Smith-Waterman alignment. This is a bug, but a fix will be complicated and might be even impossible. I will have a look at later time. For now, make sure A+(O+E)*2<=127.

lh3 commented 6 years ago

Looking at the code, I realize this issue is an intrinsic limitation of the Smith-Waterman algorithm minimap2 is using. It is not resolvable. I will let minimap2 throw an error when the scoring system breaks the algorithm.

lh3 commented 6 years ago

ab345e6 throws an error for impropriate scoring. Sorry that I am unable to fix the issue, only letting minimap2 abort. Thanks for your report anyway.