zaeleus / noodles

Bioinformatics I/O libraries in Rust
MIT License
477 stars 53 forks source link

Explicitly state the indexing of alignment start and end #281

Closed mbhall88 closed 1 month ago

mbhall88 commented 1 month ago

It would be great if you could explicitly state in the docs whether (SAM record) alignment_start() and alignment_end() are 1-based or 0-based and whether they are inclusive or exclusive. Given they are core::Position I assume that means they're 1-based, but it isn't obvious if the end is exclusive or inclusive? Looking at the code it seems end is inclusive?

zaeleus commented 1 month ago

Given they are core::Position I assume that means they're 1-based, but it isn't obvious if the end is exclusive or inclusive?

Yes, by definition, core::Position represents a 1-based coordinate.

HTS specifications define two coordinate systems: 0-based using left-closed and right-open intervals; and 1-based, closed intervals.[^1] All positions in noodles use the 1-based coordinate system in its public API.

Given this is a common area of misunderstanding (see also #226), I added a note to all methods that return positions. Thanks for the report!

[^1]: Sequence Alignment/Map Format Specification (2023-11-16) § 1.2 "Terminologies and Concepts"