Closed pd3 closed 1 year ago
The difference is due to the index on the file with no data records claiming that there is only one reference, while the one with data records says there are two. In the no-data case, trying to look up the index entry causes hts_itr_query()
to return NULL thanks to this check on tid
triggering this assertion in _reader_seek()
.
The initial count of the number of references is made in idx_calc_n_lvls_ids()
, which does not check the headers for IDX=
values. Once you start adding data records, the problem if fixed up by hts_idx_push()
which expands the index if it gets a tid
value outside the expected range.
I'm not sure if this counts as an indexing error, or if hts_itr_query()
should be more generous when it's given a tid
with no corresponding index entry.
As a side note: the idx->bidx[tid] == NULL
test in hts_itr_query()
only works on refs with no data because idx_read_core()
makes bidx[]
entries for everything, even if there's nothing in the index for a given reference. The same wouldn't be true for an index you've just built though - for them you only have a bidx
entry if some data existed. At the moment trying to look up a reference with no data on an index you've just built would fail.
This is related to https://github.com/samtools/htslib/issues/1533 and concerns BCFs with edited headers and missing data records. Consider this example
Note the header line contains the field
IDX=1
which makes it behave as if BCF was edited and the first chromosome withIDX=0
was removed.Prior to the commit d64e710 this command fails with
with the fix applied, it works
However, if there are no data records,
hts_itr_query
returns an error and the program failsThis problem does not appear when the chromosome tid block has no gaps, i.e. starts with IDX=0