nanoporetech / medaka

Sequence correction provided by ONT Research
https://nanoporetech.com
Other
391 stars 73 forks source link

Duplicate entries in annotated VCF file #489

Closed tfwulff closed 1 month ago

tfwulff commented 5 months ago

Describe the bug Hi! I performed bacterial variant calling using ` medaka_haploid_variant `: ``` medaka_haploid_variant -t 10 -m r1041_e82_400bps_sup_v4.3.0 -i -r -o ``` The resulting medaka.annotated.vcf file contains duplicate variant entries which are not present in medaka.sorted.vcf.

Logging According to the stderr (relevant parts added below), the Annotate function seems to run twice on one region of the genome. All duplicate variant entries in the medaka.annotated.vcf are from this region (in this case between 1,312,608 and 1,519,097):

``` [12:03:48 - Annotate] Getting chrom coordinates [12:03:48 - Annotate] Processing chunk with coordinates: contig1:19097-519097 [12:03:48 - Annotate] Processing chunk with coordinates: contig1:519097-1019097 [12:03:48 - Annotate] Processing chunk with coordinates: contig1:1019097-1519097 [12:03:48 - Annotate] Processing chunk with coordinates: contig1:1312608-1812608 ```

Environment

cjw85 commented 3 months ago

Hi @tfwulff,

Would it be possible for you to share you inputs in order to investigate further? This is not something we've ever observed.

cjw85 commented 1 month ago

Closing through lack of response.