rust-bio / rust-htslib

This library provides HTSlib bindings and a high level Rust API for reading and writing BAM files.
MIT License
308 stars 80 forks source link

[help] get tags is slowly #438

Closed Liripo closed 2 months ago

Liripo commented 2 months ago

Hello: I used the following code to test, 40000 reads ran for 5 minutes, is usage error?

pub fn read_bam_rs(bam_file: &str, csv_out: &str) -> Result<()> {
    let mut bam = bam::Reader::from_path(bam_file).ok().expect("Error opening bam.");
    let mut csv_writer = File::create(csv_out)?;
    writeln!(csv_writer, "CB,GX,gx,UB")?;

    for r in bam.records() {
        let record = r.ok().expect("Error reading BAM file.");
        if record.is_unmapped() || record.is_secondary() {
            continue;
        }
        // get tags
        let cb = record.aux(b"CB")?;
        let gx_value = record.aux(b"GX")?;
        let gx = record.aux(b"gx")?;
        let ub = record.aux(b"UB")?;
        writeln!(csv_writer,"{:?},{:?},{:?},{:?}",cb,gx_value,gx,ub)?;
    }
    Ok(())
}
jch-13 commented 2 months ago

Are your building the example in release mode (cargo run/build --release)? Wrapping the output File in a BufWriter instance might also speed things up considerably.

Liripo commented 2 months ago

Thanks for your help, it's much faster.