zaeleus / noodles

Bioinformatics I/O libraries in Rust
MIT License
482 stars 52 forks source link

Index BCF Query "invalid chrom" #181

Closed tshauck closed 1 year ago

tshauck commented 1 year ago

Hi,

Thanks for building this package, it's been very nice to use.

I wanted to report that I've started getting:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Custom { kind: InvalidInput, error: "invalid chrom" }', src/main.rs:19:29
stack backtrace:

when attempting to iterate through the query results for a BCF file. Here's a minimal example that uses the vcf from here.

use std::{path::PathBuf, str::FromStr};

use noodles::bcf;

fn main() {
    let path = PathBuf::from("PRJNA784038_illumina.bcf");
    let file = std::fs::File::open(&path).unwrap();

    let mut reader = bcf::Reader::new(file);
    let header = reader.read_header().unwrap();

    let index = noodles::csi::read(path.with_extension("bcf.csi")).unwrap();

    let region = noodles::core::Region::from_str("NC_045512.2").unwrap();

    let query = reader.query(&header, &index, &region).unwrap();

    for result in query {
        let record = result.unwrap();
        println!("{:?}", record);
    }
}

I think the root cause is a change between 0.41 and 0.42, namely this change: https://github.com/zaeleus/noodles/commit/b56d31c23cff7a3aa078846430ff99a0a2fa2760. The logic change, I think, is that calling Reader::from results in the string_maps that were on the original reader after the call to read_header to be returned to the default string maps. This then means that the chrom that should've existed in the string_maps no longer does and the get fails.

E.g. here you see string_maps with data...

image

But then once we're inside read_record, string_maps.contigs() is empty...

image

Please let me know if I can clarify/fix anything or if it's just user error :)

zaeleus commented 1 year ago

Ah, yes, the string maps aren't being passed to the new reader when constructing Query. In c0b504dc8dbcb1eed645d95a66342f17abc6d0c5, I moved the reader state into the iterator to work around this. Thanks both for reporting and investigating!