zaeleus / noodles

Bioinformatics I/O libraries in Rust
MIT License
477 stars 53 forks source link

bgzf MultiThreadedReader Missing Error Message #246

Closed asgray closed 5 months ago

asgray commented 5 months ago

I had a file compressed with gzip (compressed size 1.1Mb) that I tried to read with bgzf::MultithreadedReader. It didn't work, but the program didn't panic or warn me about anything, it just suspended. Relevant code:

let in_f = File::open(maskfile).expect("Could Not Open Input File");
let reader = bgzf::MultithreadedReader::with_worker_count(5, in_f);
let mut writer = bgzf::Writer::new(stdout());
for line in reader.lines().filter_map(|res| res.ok()) {
   ... // program hangs here, nothing in the loop is executed
}

The same code on a smaller gzipped file (compressed size 4.6Kb) did execute successfully. Recompressing both files with bgzip solved the problem. Ideally the reader should detect if the compression is the wrong format, or at least cause the program to panic.

zaeleus commented 5 months ago

BGZF blocks are always validated, and the decoder will return an error if validation fails. In your example, the filter_map is throwing away all the errors, which is why it seems like nothing is happening.

If you handle the result yourself, you'll get the invalid data I/O error, e.g.,

for result in reader.lines() {
    let line = result?;
    // ...
}
Error: Custom { kind: InvalidData, error: "invalid BGZF header" }
asgray commented 5 months ago

I see, thanks!