samtools / htslib

C library for high-throughput sequencing data formats
Other
789 stars 447 forks source link

Fix bug where bin number could overflow when looking for max_off #1595

Closed daviesrob closed 1 year ago

daviesrob commented 1 year ago

When searching for max_off, hts_itr_query() and hts_itr_multi_bam() look for a bin to the right of the end of the region. For whole chromosomes, this would be HTS_POS_MAX, which is far beyond the maximum bin position supported. The bin calculation overflowed leading to either a negative bin number or an incorrect positive one, depending on the number of levels in the index. Negative bin numbers simply caused time to be wasted as the search loop eventually counted up to zero, but incorrect positive ones could cause the iterator to finish too early.

Fix by catching the out-of-bounds case and setting max_off to UINT64_MAX, which should be used for bins beyond the end of the indexable range.

Luckily in practice this didn't cause too much harm, at least for the default min_shift value. Indexes with up to six levels overflowed to negative bin numbers. For seven you got one referencing a region starting at about 17Gbases, and 257Gbases for eight, so it's unlikely searches on real data were affected. The fix is trivial though and avoids some negative value shifts so worth doing.