zaeleus / noodles

Bioinformatics I/O libraries in Rust
MIT License
515 stars 53 forks source link

mutliple `Position` definitions #191

Closed brentp closed 1 year ago

brentp commented 1 year ago

Hi, I am trying to use Position and find that there are multiple versions defined so I get errors like this one:

 if record.position() >= p {
   |               -----------------    ^ expected `Position`, found `noodles::noodles_core::Position`
   |               |
   |               expected because this is `noodles::noodles_vcf::record::Position`

Is there a reason why there is noodles::core::Position and noodles::vcf::record::Position ? Or is there a simple way to compare them?

I simply want to check if the record is strictly before a given region/position.

zaeleus commented 1 year ago

noodles-vcf has its own position wrapper because VCF positions are overloaded to flag telomeres, which includes 0. § 1.6.1 "Fixed fields" (VCFv4.4 (2023-01-27)):

  1. POS -- position: [...] Telomeres are indicated by using positions 0 or N+1, where N is the length of the corresponding chromosome or contig.

I recommend performing a vcf::record::Position to core::Position conversion, i.e., Position::try_from(usize::from(vcf_position)).

See also https://github.com/zaeleus/noodles/issues/89#issuecomment-1151227983.

zaeleus commented 1 year ago

While I prefer seeing the explicit conversion, I ended up implementing PartialEq<core::Position> and PartialOrd<core::Position> in https://github.com/zaeleus/noodles/compare/5147665a06...8a7b6a29f9 to allow vcf::record::Position with core::Position comparisons, like in your case.

I wonder if replacing VCF positions with Option<core::Position> is more natural, as it better supports the common case. Telomeric breakends would then have to be handled as None = 0 and a lookup for N+1. I'll also have to explore how other implementations even handle indexing and querying records that are position 0.

brentp commented 1 year ago

Thank you! I don't grok all the design decisions, but as a user a single Position type seems attractive.