lindenb / jvarkit

Java utilities for Bioinformatics
https://jvarkit.readthedocs.io/
Other
482 stars 133 forks source link

Error in partial missing values for GL field #140

Closed arouette closed 4 years ago

arouette commented 4 years ago

Verify

openjdk version "13.0.1" 2019-10-15 OpenJDK Runtime Environment (build 13.0.1+9) OpenJDK 64-Bit Server VM (build 13.0.1+9, mixed mode, sharing)

Subject of the issue

I ran bcftools merge for a dozen of vcfs for which variants were called using freebayes. The merged_vcf was used as input for FixVcfMissingGenotypes. Unfortunately, the scripts fails after a few seconds and outputs the following error: "partial missing values for GL field". It fails on the 103th entry of the vcf.

Your environment

Error message

$ java -jar ~/COMMUN/sotfware/jvarkit/dist/fixvcfmissinggenotypes.jar -d 10 -B NORMAL_BAM.list < /freebayes_normals_2012-12-23_SLICE.vcf > IRIC_freebayes_normal_transcriptomes_2012-12-23_FixedMissingGT.vcf
[INFO][FixVcfMissingGenotypes]Reading header for BAM1_Aligned.sortedByCoord.out.bam
[INFO][FixVcfMissingGenotypes]Reading header for BAM2_Aligned.sortedByCoord.out.bam
[INFO][FixVcfMissingGenotypes]Reading header for BAM3_Aligned.sortedByCoord.out.bam
[INFO][FixVcfMissingGenotypes]Reading header for BAM4_Aligned.sortedByCoord.out.bam
[INFO][FixVcfMissingGenotypes]Reading header for BAMX_Aligned.sortedByCoord.out.bam
[INFO][FixVcfMissingGenotypes]Count: 100 Elapsed: 11 seconds(0.00%) Remains: 2 days(100.00%) Last: chr1:134,876
[SEVERE][FixVcfMissingGenotypes]partial missing values for GL field
htsjdk.tribble.TribbleException: partial missing values for GL field
        at htsjdk.variant.variantcontext.GenotypeLikelihoods.parseDeprecatedGLString(GenotypeLikelihoods.java:283)
        at htsjdk.variant.variantcontext.GenotypeLikelihoods.fromGLField(GenotypeLikelihoods.java:92)
        at htsjdk.variant.vcf.AbstractVCFCodec.createGenotypeMap(AbstractVCFCodec.java:817)
        at htsjdk.variant.vcf.AbstractVCFCodec$LazyVCFGenotypesParser.parse(AbstractVCFCodec.java:121)
        at htsjdk.variant.variantcontext.LazyGenotypesContext.decode(LazyGenotypesContext.java:158)
        at htsjdk.variant.variantcontext.LazyGenotypesContext.getGenotypes(LazyGenotypesContext.java:148)
        at htsjdk.variant.variantcontext.GenotypesContext.get(GenotypesContext.java:417)
        at htsjdk.variant.variantcontext.VariantContext.getGenotype(VariantContext.java:1102)
        at com.github.lindenb.jvarkit.tools.misc.FixVcfMissingGenotypes.doVcfToVcf(FixVcfMissingGenotypes.java:233)
        at com.github.lindenb.jvarkit.util.jcommander.Launcher.doVcfToVcf(Launcher.java:567)
        at com.github.lindenb.jvarkit.util.jcommander.Launcher.doVcfToVcf(Launcher.java:614)
        at com.github.lindenb.jvarkit.tools.misc.FixVcfMissingGenotypes.doWork(FixVcfMissingGenotypes.java:346)
        at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMain(Launcher.java:777)
        at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMainWithExit(Launcher.java:940)
        at com.github.lindenb.jvarkit.tools.misc.FixVcfMissingGenotypes.main(FixVcfMissingGenotypes.java:358)
[INFO][Launcher]fixvcfmissinggenotypes Exited with failure (-1)
lindenb commented 4 years ago

It's a problem with your VCF file.

As far as I understand Some FORMAT/GL have some missing values https://github.com/samtools/htsjdk/blob/master/src/main/java/htsjdk/variant/variantcontext/GenotypeLikelihoods.java#L266 , I'm pretty sure you would get the same kind of error if you would use another htsjdk-based tool like gatk or picard needing a genotype.

For a try, try to remove the GL field using bcftools annotate

    bcftools annotate -x 'FORMAT/GL' input.vcf > output.vcf
lindenb commented 4 years ago

may be related to this: https://github.com/samtools/htsjdk/pull/1372

sorry I don't have the time to look at this for now.

arouette commented 4 years ago

It was a problem with my VCF file, you are right. It worked well after removing the GL field.

Thank you for your help! Alexandre