zaeleus / noodles

Bioinformatics I/O libraries in Rust
MIT License
477 stars 53 forks source link

Documentation is opaque on 0 vs 1 indexing, and whether regions are right-inclusive or not. #226

Closed vsbuffalo closed 1 month ago

vsbuffalo commented 7 months ago

Thank you for writing noodles! Really exciting to have great Rust interfaces to so many bioinformatic formats.

I've noticed is it's currently hard to figure out from docs whether

  1. If all readers convert to a common position/range system (i.e. 1-indexed, right-inclusive), or whether this is for the user to do (e.g. for 1-based, right-inclusive formats like GFF/GTF need different downstream processing than 0-based, right-exclusive formats like BED).
  2. Relatedly, can we count on core::Position and core:Region to have this standardized?

I would be happy to help contribute to documentation on this issue, if helpful.

zaeleus commented 7 months ago

I believe the type system resolves any coordinate system ambiguity.

All positions in noodles are normalized to be 1-based, wrapped by, as you mentioned, core::Position. The 1-based coordinate system, by definition, uses closed (or inferred unbounded) intervals. This is noted in core::region::Interval.

The documentation shows the return types of positional values. In your examples, gff::Record::start and bed::Record::start_position define positions as 1-based.