rust-bio / rust-htslib

This library provides HTSlib bindings and a high level Rust API for reading and writing BAM files.
MIT License
308 stars 80 forks source link

BCF reader partial unpacking for site-only queries #411

Open bguo068 opened 1 year ago

bguo068 commented 1 year ago

Thank you for the wonderful binding to htslib c library. It makes htslib so much easier to work with.

Is there a way to unpack the site-only information to quickly get site information when the BCF file contains 200k samples?

Currently, I experimented with cloning the repo and changing the BCF_UN_ALL to BCF_UN_SHR in htslib::bcf_unpack(record.inner_mut(), htslib::BCF_UN_ALL as i32);. It worked and ran very quickly. However, it might be unsafe if we are trying to access genotype data from the resulting record.

Any suggestions or plans to provide a rust interface for partial unpacking?

https://github.com/rust-bio/rust-htslib/blob/3008a131f241b423d041c756fc96410f6412e3d8/src/bcf/mod.rs#L210C6-L223

    fn read(&mut self, record: &mut record::Record) -> Option<Result<()>> {
        match unsafe { htslib::bcf_read(self.inner, self.header.inner, record.inner) } {
            0 => {
                unsafe {
                    // Always unpack record.
                    htslib::bcf_unpack(record.inner_mut(), htslib::BCF_UN_ALL as i32);
                }
                record.set_header(Rc::clone(&self.header));
                Some(Ok(()))
            }
            -1 => None,
            _ => Some(Err(Error::BcfInvalidRecord)),
        }
    }