zaeleus / noodles

Bioinformatics I/O libraries in Rust
MIT License
498 stars 52 forks source link

Should writing Strings to INFO be escaped #300

Closed holtgrewe closed 2 months ago

holtgrewe commented 2 months ago

Hi, when writing a STRING value to an INFO field that contains a semicolon ; then this file cannot be imported any more.

Should I escape this in my code using noodles-vcf or should noodles do it?

zaeleus commented 2 months ago

Thanks for the report.

The output is correct. Semicolons (;) in info string values are being percent-encoded to %3B (e.g., "a;b" => "a%3Bb"). However, vcf::Record is not decoding them, which in turn makes the writer, essentially, double encode.

The buffered record reader (Reader::read_record_buf and Reader::record_bufs) can be used as a workaround for now.

zaeleus commented 1 month ago

This is now fixed in noodles 0.81.0 / noodles-vcf 0.64.0.