samtools / htslib

C library for high-throughput sequencing data formats
Other
799 stars 445 forks source link

Add faidx_seq_len64(), fai_adjust_region() interfaces and faidx tests #1519

Closed daviesrob closed 1 year ago

daviesrob commented 1 year ago

Add faidx_seq_len64() as a replacement for faidx_seq_len() that can return the correct length of sequences longer than INT_MAX.

Make faidx_seq_len() clamp its output at INT_MAX, which is probably slightly less bad than overflowing.

Adds a fai_adjust_region() function that can be used to ensure that a given range does not go beyond the end of the requested sequence. The interface is designed so that the output of fai_parse_region() can be passed to it. This essentially exposes the internal faidx_adjust_position() function which is currently used to enforce the same limits in the faidx_fetch_seq64() and faidx_fetch_qual64() interfaces. The new function allows callers to get a better idea of what will be retrieved by applying the limits in advance.

While writing this, I discovered that faidx didn't really have any tests of its own, although some were run as side-effects of other tests. The second (rather bigger) commit adds some dedicated faidx tests.