samtools / htslib

C library for high-throughput sequencing data formats
Other
784 stars 447 forks source link

CRAM load_ref_portion() fails on some Mistletoe references #1734

Closed daviesrob closed 4 months ago

daviesrob commented 5 months ago

Even though the Mistletoe genome has been chopped into references less than 2^31 bases long, it seems they're still long enough to trigger an arithmetic overflow in load_ref_portion().

The fai entry for the first reference is:

OY728119.1      2143528264      58      80      81

which gives these values for offest and len in the code linked above:

(gdb) p e->line_length ? e->offset + (start-1)/e->bases_per_line * e->line_length + (start-1) % e->bases_per_line : start-1
$7 = 58

(gdb) p (e->line_length ? e->offset + (end-1)/e->bases_per_line * e->line_length + (end-1) % e->bases_per_line : end-1) - offset + 1
$9 = -2124644929

The len value goes on to cause a malloc() failure, which is happily caught but causes the CRAM writer to drop into embed_ref mode when it shouldn't have to.