zaeleus / noodles

Bioinformatics I/O libraries in Rust
MIT License
482 stars 52 forks source link

Should record::data::field::tag::other be public? #197

Closed jamestwebber closed 1 year ago

jamestwebber commented 1 year ago

I am using noodles for some utilities that process BAM files based on non-standard tags (they are added by other tools). It's easy enough to construct the tag with something like

let cb = Tag::try_from([b'C', b'B']).unwrap();

But I thought "hey, surely I can make this const" with

const cb: Tag = Tag::Other(Other([b'C', b'B']));

But this isn't currently possible because other is not public. Maybe it should be, to allow this? Or there's a better way to accomplish this. Or maybe this is just unnecessary--this isn't a real issue in my current code, just thought it would be cleaner.

zaeleus commented 1 year ago

If this is for access, you can use [u8; 2] instead, e.g.,

const CB: [u8; 2] = [b'C', b'B'];
let _ = record.data().get(&CB);

The inner field of tag::Other is not public because it's a newtype that constrains the alphabet: /[A-Za-z][A-Za-z0-9]/. For example, tag::Other([b'1', b'N']), tag::Other([0xff, 0xff]), etc., are invalid.

jamestwebber commented 1 year ago

Oh I didn't realize that was possible, that's perfect then.

jamestwebber commented 1 year ago

Interesting twist I found: using [u8; 2] works on record.data().get() but it doesn't work if you build a HashMap first (which I was doing needlessly as I only need one tag):

let _ = record.data().get(&CB); // works great

let data: HashMap<_, _> = record.data().iter().collect();
let _ = data.get(&CB); // always None

edit: just realized that I do want to access multiple tags in a different library, so this is still an open problem for me (but I suspect there's a solution without modifying noodles).

I'm not sure why a HashMap<Tag, &Value> is okay with me passing [u8; 2] to get but doesn't return the value. Perhaps the HashMap requirements are not fulfilled?

The key may be any borrowed form of the map's key type, but Hash and Eq on the borrowed form must match those for the key type.

jamestwebber commented 1 year ago

🎉 you're a magician

zaeleus commented 1 year ago

sam::record::Data is a linear map and looks for equivalence instead of using the tag hash. Tag was being incorrectly hashed with the enum variant. Nice catch, thanks!

This is fixed in noodles 0.49.0 / noodles-sam 0.40.0. You caught me right before the snapshot!

nh13 commented 11 months ago

How do we avoid the .parse().unwrap() on? I'd like to have the whole expression be constant.

    data.insert(
        "x1".parse().unwrap(),
        noodles::sam::record::data::field::Value::from(value),
    );
zaeleus commented 11 months ago

How do we avoid the .parse().unwrap() on?

Tracking in #204.