Additionally with this data file samtools goes into an infinite(?) loop when given [a particular location / query interval]: […]
This can be traced to the do … while loop in hts_itr_query(), which loops perhaps forever through negative bin numbers. Possibly this code should check for a beg value massively beyond the maximum position covered by the index's bins and sidestep most of its processing.
This PR checks start positions of query intervals against the maximum position representable in the index's geometry, to avoid negative bin numbers and the resulting infinite loops in the do...while loop. It's effectively the beg equivalent of #1595.
I considered implementing the “massively beyond” bit by e.g. adding 256 as various parts of the code do for somewhat different reasons. However tracing the guarded code in hts_itr_query() and hts_itr_multi_bam() (and looking at the checks for data getting into the index at all) suggests that the result would be a finished / unchanged iterator for beg immediately past maxpos anyway — so having the bound exactly at maxpos is fine.
Introduces hts_bin_maxpos() and hts_idx_maxpos(), and uses them wherever the maxpos calculation appears. I've left the latter private, at least for now.
This also changes the existing end checks to <= as end is exclusive -- note it is used as end-1 in the code guarded by the checks. In practice this probably won't make much difference to anything.
As noted on https://github.com/samtools/samtools/pull/2032 (which includes test queries exhibiting the problem):
This PR checks start positions of query intervals against the maximum position representable in the index's geometry, to avoid negative bin numbers and the resulting infinite loops in the
do...while
loop. It's effectively thebeg
equivalent of #1595.I considered implementing the “massively beyond” bit by e.g. adding 256 as various parts of the code do for somewhat different reasons. However tracing the guarded code in
hts_itr_query()
andhts_itr_multi_bam()
(and looking at the checks for data getting into the index at all) suggests that the result would be a finished / unchanged iterator forbeg
immediately pastmaxpos
anyway — so having the bound exactly atmaxpos
is fine.Introduces
hts_bin_maxpos()
andhts_idx_maxpos()
, and uses them wherever themaxpos
calculation appears. I've left the latter private, at least for now.This also changes the existing
end
checks to<=
asend
is exclusive -- note it is used asend-1
in the code guarded by the checks. In practice this probably won't make much difference to anything.