nanoporetech / medaka

Sequence correction provided by ONT Research
https://nanoporetech.com
Other
391 stars 73 forks source link

Medaka variant tools not producing DP and AD fields in VCF FORMAT output #470

Closed kevin-wamae closed 7 months ago

kevin-wamae commented 7 months ago

Describe the bug

I'm having trouble with medaka variant and medaka_haploid_variant not generating the DP and AD fields.

For example, when I use clair3, I get GT:GQ:DP:AD:AF in the FORMAT field. But when I use medaka, I only get GT:GQ in the FORMAT field.

Logging

Here are the commands I'm using:

medaka variant --verbose genome.fa consensus.hdf output.vcf

medaka_haploid_variant -m r1041_e82_400bps_sup_variant_v4.2.0 -r genome.fa -i test.fastq -o vcf_dir

Here's the header from medaka variant:

##fileformat=VCFv4.1
##medaka_version=1.11.1
##contig=<ID=Pf3D7_04_v3,length=1200490>
##contig=<ID=Pf3D7_05_v3,length=1343557>
##contig=<ID=Pf3D7_07_v3,length=1445207>
##contig=<ID=Pf3D7_08_v3,length=1472805>
##contig=<ID=Pf3D7_11_v3,length=2038340>
##contig=<ID=Pf3D7_13_v3,length=2925236>
##FORMAT=<ID=GT,Number=1,Type=String,Description="Medaka genotype.">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Medaka genotype quality score">
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  SAMPLE
Pf3D7_04_v3 748262  .   T   C   25.05   PASS    .   GT:GQ   1:25
Pf3D7_04_v3 748410  .   G   A   27.82   PASS    .   GT:GQ   1:28

Here's the header from medaka_haploid_variant:

##fileformat=VCFv4.1
##medaka_version=1.11.1
##contig=<ID=Pf3D7_04_v3>
##contig=<ID=Pf3D7_05_v3>
##contig=<ID=Pf3D7_07_v3>
##contig=<ID=Pf3D7_08_v3>
##contig=<ID=Pf3D7_11_v3>
##FILTER=<ID=PASS,Description="All filters passed">
##medaka_version=1.11.1
##contig=<ID=Pf3D7_04_v3,length=1200490>
##contig=<ID=Pf3D7_05_v3,length=1343557>
##contig=<ID=Pf3D7_07_v3,length=1445207>
##contig=<ID=Pf3D7_08_v3,length=1472805>
##contig=<ID=Pf3D7_11_v3,length=2038340>
##contig=<ID=Pf3D7_13_v3,length=2925236>
##FORMAT=<ID=GT,Number=1,Type=String,Description="Medaka genotype.">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Medaka genotype quality score">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Depth of reads at pos">
##INFO=<ID=DPS,Number=2,Type=Integer,Description="Depth of reads at pos by strand (fwd, rev)">
##INFO=<ID=DPSP,Number=1,Type=Integer,Description="Depth of reads spanning pos +-25">
##INFO=<ID=SR,Number=.,Type=Integer,Description="Depth of spanning reads by strand which best align to each allele (ref fwd, ref rev, alt1 fwd, alt1 rev, etc.)">
##INFO=<ID=AR,Number=2,Type=Integer,Description="Depth of ambiguous spanning reads by strand which align equally well to all alleles (fwd, rev)">
##INFO=<ID=SC,Number=.,Type=Integer,Description="Total alignment score to each allele of spanning reads by strand (ref fwd, ref rev, alt1 fwd, alt1 rev, etc.) aligned with parasail: match 5, mismatch -4, open 5, extend 3">
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  SAMPLE
Pf3D7_04_v3 748262  .   T   C   49.325  PASS    DP=8074;DPS=4123,3951   GT:GQ   1:49
Pf3D7_04_v3 748410  .   G   A   54.052  PASS    DP=8085;DPS=4124,3961   GT:GQ   1:54

Environment

cjw85 commented 7 months ago

This is the expected output. The medaka_haploid_variant helper script runs medaka tools annotate to add the extra fields.

The models used in medaka do not produce an estimate of allele frequency.