jeff-k / bio-seq

Bit packed and well-typed biological sequences
MIT License
21 stars 2 forks source link

Why different use statements for 4-letter DNA and IUPAC DNA #2

Closed troelsarvin closed 2 years ago

troelsarvin commented 2 years ago

The "use" statements seem to work differently for four-letter DNA and IUPAC, see the following two examples. I don't understand why the "use" statement patter differs. Is there something I've misunderstood?

For DNA:

use bio_seq::*;
use bio_seq::codec::Dna;
fn main() {
    let input_string = "AAAA";
    let dna_seq = Seq::<Dna>::from_str(input_string).unwrap();
    println!("dna_seq: {}", dna_seq);
}

For IUPAC:

use bio_seq::*;
use bio_seq::codec::iupac::Iupac;
fn main() {
    let input_string = "AAAA";
    let iupac_seq = Seq::<Iupac>::from_str(input_string).unwrap();
    println!("iupac_seq: {}", iupac_seq);
}

Note the difference in particular between: use bio_seq::codec::Dna; and use bio_seq::codec::iupac::Iupac;

jeff-k commented 2 years ago

Thanks for pointing this out. In the version on cargo the bio_seq::codec::dna::Dna struct is reexported to bio_seq::codec::Dna. I've changed this to be more consistent with the other encoders, so that it should be imported as:

use bio_seq::codec::dna::Dna;

This is because I'm planning to have multiple encodings for dna sequences built-in, like perhaps bio_seq::codec::asii::Dna for the 8-bit bytestring encoding.

On the other hand, since 2-bit Dna is probably going to be the most popular encoding, I'm open to the idea of privileging this struct. Maybe it could be exported in the root level like bio_seq::Dna.

I'm very open to feedback if you have any input!

I'll leave this issue open until I update the readme and cargo release