zaeleus / noodles

Bioinformatics I/O libraries in Rust
MIT License
512 stars 53 forks source link

sam/io/writer/record: Increase visibility of SAM record CIGAR string serializer #247

Closed zaeleus closed 7 months ago

zaeleus commented 7 months ago

@zaeleus thanks for all your work on this crate. What are the chances we can do the same and add noodles::sam::alignment::record_buf::Cigar with signature

pub fn new(src: Vec<u8>) -> Self { ... }

And also expose a function

pub fn as_bytes(&self) -> &[u8]  { ... }

To both ...::record_buf:::Cigar and ...::record::Cigar?

It would be useful generally, and specifically for someone like me who wants to extract the CIGAR string and update the MC tag in BAM/SAM records.

Originally posted by @theJasonFan in https://github.com/zaeleus/noodles/issues/233#issuecomment-2027767346


The request here is to provide a method to serialize alignment record CIGAR operations to SAM CIGAR strings. This would enable using the serialized form in, e.g., record data string values.

@theJasonFan, the alignment record CIGAR buffer (sam::alignment::record_buf::Cigar) does not use a byte buffer for its internal representation, so your proposed methods won't quite work here. I think the better solution is to increase the visibility of the SAM record CIGAR field serializer, which will allow any implementation of alignment::record::Cigar to be serialized to a SAM CIGAR string.

zaeleus commented 7 months ago

This is now available in noodles 0.68.0 / noodles-sam 0.56.0, e.g.,

// cargo add noodles@0.68.0 --features sam

use std::io;

use noodles::sam::{
    alignment::{
        record::{
            cigar::{op::Kind, Op},
            data::field::Tag,
        },
        record_buf::{data::field::Value, Cigar, Data},
    },
    io::writer::record::write_cigar,
};

fn main() -> io::Result<()> {
    let mut buf = Vec::new();
    let cigar: Cigar = [Op::new(Kind::Match, 4)].into_iter().collect();
    write_cigar(&mut buf, &cigar)?;

    let mut data = Data::default();
    data.insert(Tag::MATE_CIGAR, Value::String(buf.into()));

    eprintln!("{data:?}");
    // => Data([(Tag("MC"), String("4M"))])

    Ok(())
}
theJasonFan commented 7 months ago

That was quick. Amazing! Thanks!