Closed tshauck closed 10 months ago
I've been reworking the alignment formats, and I actually very recently fixed this in SAM: https://github.com/zaeleus/noodles/commit/92c51198fac5dc6710fcabc4ddda929f9ff9a6e0. The current reader fails when optional fields aren't present, but I haven't started reworking the variant formats yet.
The first 8 fields are required, so I don't think it's valid to return success if a line feed is reached.
Cool... do you have plans to work on the variant format soon / want me to hold off, or mind if I adapted your approach w/ SAM files to VCFs? It's causing those files to give incorrect results, so I'd like to get it resolved.
Allow me to fix the blocking reader and reuse that in a buffered async read. I'll let you know when that's done.
The alignment format rework is close to a checkpoint, but I don't have a precise timeline. I will use my findings from that to improve the variant crates.
Sounds good to me, thanks.
Thanks for using and reporting issues with the VCF lazy records!
Hi, using the lazy vcf reader, I think I found an issue w.r.t. how the fields are being parsed. In cases where not all fields are present,
read_field
will read into the next line causingrecord.buf
to contain the record in question as it was parsed and the "raw" next record (because of the lastread_line
inread_lazy_record
).For example, if you modify test_read_lazy_record test to have two records,
&b"sq0\t1\t.\tA\t.\t.\tPASS\t.\nsq0\t1\t.\tA\t.\t.\tPASS\t."[..];
, the test fails becauserecord.buf
is actuallysq01.A..PASS.\nsq01\t.\tA\t.\t.\tPASS\t.
(the samples field has the next record).To attempt a fix, I took perhaps the dumbest and least idiomatic approach :), but I thought I'd open it up to see if you agree there's an issue and if so, thoughts on the best way to fix it.
Thanks,