zaeleus / noodles

Bioinformatics I/O libraries in Rust
MIT License
512 stars 53 forks source link

Unified async alignment reader/writer #286

Closed mbhall88 closed 2 months ago

mbhall88 commented 4 months ago

I use the noodles_util::alignment readers and writers a lot. How easy would it be to implement an aync version of this transparent reader/writer?

If this is something you think would be reasonably straightforward I would be willing to draft a PR with some guidance on where to start.

zaeleus commented 4 months ago

If this is something you think would be reasonably straightforward I would be willing to draft a PR with some guidance on where to start.

Yes, give it a go! It's a loaded issue, consisting of four parts:

I recommend focusing on each individually, submitting a patch for review per part. It is a fair amount of work/code, but don't feel obligated to complete it all.

For the first part, I normalized the alignment format async reader methods, so you shouldn't have any problems outside of util. I also added an async variant format reader and builder to use as a reference.

Here are some instructions on how to proceed with the reader:

  1. Add the alignment format dependencies (SAM, BAM, and CRAM) to the async feature in noodles-util/Cargo.toml. These use the optional dependency feature syntax.

    https://github.com/zaeleus/noodles/blob/782d892196a30e856560e060c390ee20de774859/noodles-util/Cargo.toml#L23-L28

  2. Add the modules.

    • noodles-util/src/alignment/async.rs (example)
    • noodles-util/src/alignment/async/io.rs (example)
    • noodles-util/src/alignment/async/io/reader.rs (example)
  3. Implement Reader<R>. This will not use the I/O trait approach like the blocking version. Since well know all the format variants, use an enum instead. Feel free to start with the following scaffold.

//! Async alignment reader.

/// An async alignment reader.
pub enum Reader<R> {
    /// SAM.
    Sam(sam::r#async::io::Reader<R>),
    /// BAM.
    Bam(bam::r#async::io::Reader<R>),
    /// CRAM.
    Cram(cram::r#async::io::Reader<R>),
}

impl<R> Reader<R>
where
    R: AsyncBufRead + Unpin,
{
    /// Reads the SAM header.
    pub async fn read_header(&mut self) -> io::Result<sam::Header> {
        todo!()
    }

    /// Returns an iterator over records starting from the current stream position.
    pub fn records<'r, 'h: 'r>(
        &'r mut self,
        header: &'h sam::Header,
    ) -> impl Stream<Item = io::Result<Box<dyn sam::alignment::Record>>> + 'r {
        todo!()
    }
}

Let me know if you have any further questions or issues.

mbhall88 commented 3 months ago

I wonder if it also makes sense to add an asynchronous version of an indexed reader? Given the implementations on SAM/BAM/CRAM are !Send I guess we probably need that auto trait before we could do that? And I hope I understand this correctly, we might also want Sync as currently it is !Sync, though my understanding is it doesn't have to be Sync to be used in async code correct?

zaeleus commented 2 months ago

I wonder if it also makes sense to add an asynchronous version of an indexed reader?

Tracked in #296.