nanoporetech / medaka

Sequence correction provided by ONT Research
https://nanoporetech.com
Other
391 stars 73 forks source link

Medaka variant error: OSError: .vcf is unsorted at index #8. #417

Closed yul96 closed 8 months ago

yul96 commented 1 year ago

Got this error when using Medaka variant on a sample like: medaka_haploid_variant -i input.fastq.gz -r reference.fasta -o medaka-variant -m r1041_e82_260bps_sup_g632

Below is the vcf generated by medaka:

fileformat=VCFv4.1

medaka_version=1.7.2

contig=

FORMAT=

FORMAT=

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE

VEC-1 2005 . CAAAAAAA C 21.977 PASS . GT:GQ 1:22 VEC-1 2006 . AAAAAAAAAA C 25.521 PASS . GT:GQ 1:26 VEC-1 2005 . CA C 0.827 PASS . GT:GQ 1:1 VEC-1 2005 . CAAAAAA C 25.662 PASS . GT:GQ 1:26 VEC-1 2005 . CA C 5.859 PASS . GT:GQ 1:6 VEC-1 2005 . CAAA C 3.881 PASS . GT:GQ 1:4 VEC-1 2025 . AAAAA C 32.895 PASS . GT:GQ 1:33 VEC-1 2031 . A C 9.731 PASS . GT:GQ 1:10 VEC-1 2033 . A C 0.882 PASS . GT:GQ 1:1 VEC-1 2046 . A C 6.942 PASS . GT:GQ 1:7

I found if I move the line of 2006 to before 2025, then the error goes away.

cjw85 commented 10 months ago

Hi @yul96 @edgraham

There is a corner case in how medaka outputs and "normalizes" variant records which can lead to the output being unsorted. We will fix this.

cjw85 commented 8 months ago

This was fixed in v1.10.0.