rust-bio / rust-htslib

This library provides HTSlib bindings and a high level Rust API for reading and writing BAM files.
MIT License
308 stars 80 forks source link

Panic on omitted trailing FORMAT fields #407

Open fennerm opened 1 year ago

fennerm commented 1 year ago

I'm running into a panic when attempting to parse a VCF with rust_htslib::bcf. I can't share the real VCF but here's a minimal example:

##fileformat=VCFv4.3
##contig=<ID=chr1,length=10000>
##INFO=<ID=FOO,Number=1,Type=Integer,Description="Some field">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=ABC,Number=1,Type=String,Description="Some string field">
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  SAMPLE1
chr1    1234    .   t   a   .   .   FOO=1   GT:ABC  .

The problem is that the sample column only has one "." but there are two fields defined in the FORMAT column. Per the VCF spec I think this is valid:

Trailing fields can be dropped, with the exception of the GT field, which should always be present if specified in the FORMAT field.

Panic message:

thread panicked at 'chunk size must be non-zero', /Users/fennerm/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rust-htslib-0.44.1/src/bcf/record.rs:1490:18

Relevant line in code: https://github.com/rust-bio/rust-htslib/blob/master/src/bcf/record.rs#L1490

let val = record.format_shared_buffer(b"ABC", &mut buffer).string()
fennerm commented 1 year ago

Will try to debug a bit and submit a PR

fennerm commented 1 year ago

Took an initial look but couldn't figure out the root cause. Jotting down my notes: