zaeleus / noodles

Bioinformatics I/O libraries in Rust
MIT License
498 stars 52 forks source link

Add async alignment writer #292

Closed mbhall88 closed 2 months ago

mbhall88 commented 2 months ago

See #286 for more details.

Checklist

Currently, I cannot compile the async Writer due to

error[E0308]: mismatched types
   --> noodles-util/src/alignment/async/io/writer.rs:80:63
    |
80  |             Self::Cram(writer) => writer.write_record(header, record).await,
    |                                          ------------         ^^^^^^ expected `Record`, found `&dyn Record`
    |                                          |
    |                                          arguments to this method are incorrect
    |
    = note: expected struct `noodles_cram::Record`
            found reference `&dyn noodles_sam::alignment::Record`
note: method defined here
   --> /Users/michael/Projects/noodles/noodles-cram/src/async/io/writer.rs:181:18
    |
181 |     pub async fn write_record(
    |                  ^^^^^^^^^^^^

Which is coming from the fact that cram::async::io::Writer::write_record takes a cram::Record type rather than a &dyn sam::alignment::Record trait.

https://github.com/zaeleus/noodles/blob/174ad3c98092728a6be3feabb346619ebd14a5ad/noodles-cram/src/async/io/writer.rs#L184

Is that a change that should be made in a separate PR?

In addition, I also noticed some slight differences in function naming schemes between SAM/BAM and CRAM. For example, the SAM and BAM async Writers have a write_header function, however, the reciprocal CRAM function is write_file_header - not sure if this was intended though. Also, the CRAM async builder has the function build_with_writer rather than the build_from_writer. I appreciate these are minor things, but thought I would raise them in case you want to move this library towards consistent names across modules/crates. I guess for backwards compatibility we could add the [#[deprecated](https://doc.rust-lang.org/reference/attributes/diagnostics.html#the-deprecated-attribute) attribute if you want to introduce the consistent named version.

zaeleus commented 2 months ago
    = note: expected struct `noodles_cram::Record`
            found reference `&dyn noodles_sam::alignment::Record`

Is that a change that should be made in a separate PR?

For now, convert &dyn sam::alignment::Record to a cram::Record using cram::Record::try_from_alignment_record. I can normalize the interfaces after this patch.

In addition, I also noticed some slight differences in function naming schemes between SAM/BAM and CRAM. For example, the SAM and BAM async Writers have a write_header function, however, the reciprocal CRAM function is write_file_header - not sure if this was intended though.

Yes, this is intentional. I/O is modeled after the physical structure for the format (see § 5 "File structure" (2024-03-21) for a visual representation). The file header is named to differentiate itself from the many other headers in the format, e.g., the container header, the slice header, the block header, etc. I did add a convenience method to the blocking reader in 2da4eae513c3de8b7ecd64ff0246fa61c56a58bb, which simply needs to be synchronized with the async reader later. Use the currently available methods to write a default file definition, and use the SAM header to write the file header, e.g.,

https://github.com/zaeleus/noodles/blob/174ad3c98092728a6be3feabb346619ebd14a5ad/noodles-cram/examples/cram_write_async.rs#L73-L74

Also, the CRAM async builder has the function build_with_writer rather than the build_from_writer.

Good catch; it's definitely a typo. Use the methods that are currently available. I can also fix this after this patch.

mbhall88 commented 2 months ago

Further to the above, when adding the async Writer example I realised I had missed adding Writer::finish. Again, the individual BAM and CRAM methods that are analogous are called shutdown instead of finish. I took the name finish to mirror the sync alignment API. Not sure if these too should be harmonised?

Aside from that, I think this PR is done and ready for review.