// If we have a missing field but multiple values are expected, we need to construct a new string with all fields.
// For example, if Number=2, the string has to be ".,."
The VCFCodec does not honor this contract and I get an error when trying to parse the above line:
java.lang.NumberFormatException: For input string: "."
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
at java.lang.Double.parseDouble(Double.java:538)
at htsjdk.variant.variantcontext.GenotypeLikelihoods.parseDeprecatedGLString(GenotypeLikelihoods.java:264)
at htsjdk.variant.variantcontext.GenotypeLikelihoods.fromGLField(GenotypeLikelihoods.java:90)
at htsjdk.variant.vcf.AbstractVCFCodec.createGenotypeMap(AbstractVCFCodec.java:722)
at htsjdk.variant.vcf.AbstractVCFCodec$LazyVCFGenotypesParser.parse(AbstractVCFCodec.java:132)
at htsjdk.variant.variantcontext.LazyGenotypesContext.decode(LazyGenotypesContext.java:158)
at htsjdk.variant.vcf.AbstractVCFCodec.parseVCFLine(AbstractVCFCodec.java:352)
at htsjdk.variant.vcf.AbstractVCFCodec.decodeLine(AbstractVCFCodec.java:284)
at htsjdk.variant.vcf.AbstractVCFCodec.decode(AbstractVCFCodec.java:262)
The GenotypeLikelihoods code is expecting a single "." regardless of the ploidy and the number of alleles.
Expected behaviour
VCFCodec should handle the case of multiple missing GL fields, especially since its the output generated by VCFEncoder.
Actual behaviour
See above comments
Additional information
This is the code snippet used to generate the entry above
import htsjdk.variant.vcf.*;
import java.util.Arrays;
import java.util.HashSet;
import java.util.Set;
public class Test {
public static void main(String[] args) {
final VCFHeaderLine versionHeader = new VCFHeaderLine(VCFHeaderVersion.VCF4_2.getFormatString(), VCFHeaderVersion.VCF4_2.getVersionString());
final VCFFormatHeaderLine gtHeader = new VCFFormatHeaderLine("GT", 1, VCFHeaderLineType.String, "Genotype");
final VCFFormatHeaderLine glHeader = new VCFFormatHeaderLine("GL", VCFHeaderLineCount.G, VCFHeaderLineType.Float, "GL");
final VCFFormatHeaderLine minDpHeader = new VCFFormatHeaderLine("MIN_DP", 1, VCFHeaderLineType.Integer, "Min DP");
final Set<VCFHeaderLine> headerLines = new HashSet<>();
headerLines.add(versionHeader);
headerLines.add(gtHeader);
headerLines.add(glHeader);
headerLines.add(minDpHeader);
final Genotype gt1 = new GenotypeBuilder("s1",
Arrays.asList(Allele.create("G", true), Allele.create("A")))
.attribute("GL", "0,0.9,0.1")
.make();
final Genotype gt2 = new GenotypeBuilder("s2",
Arrays.asList(Allele.NO_CALL, Allele.NO_CALL))
.attribute("MIN_DP", "10")
.make();
final GenotypesContext gtx = GenotypesContext.create(2);
gtx.add(gt1);
gtx.add(gt2);
final VariantContext vc = new VariantContextBuilder(null, "chr1", 123, 123,
Arrays.asList(Allele.create("G", true), Allele.create("A")))
.genotypes(gtx)
.make();
final Set<String> samples = new HashSet<>();
samples.add("s1");
samples.add("s2");
final VCFEncoder encoder = new VCFEncoder(new VCFHeader(headerLines, samples), false, false);
System.out.println(encoder.encode(vc));
}
}
Should output
chr1 123 . G A . . . GT:GL:MIN_DP 0/1:0,0.9,0.1 ./.:.,.,.:10
Verify
Somewhat related issue but not the same.
Subject of the issue
Getting an error reading GL fields written using htsjdk for multi-sample VCF.
Your environment
Steps to reproduce
When creating multi sample VCF, the encoder puts in ".,.,." for missing GL fields, e.g.
Specifically there's this comment in the encoder code https://github.com/samtools/htsjdk/blob/master/src/main/java/htsjdk/variant/vcf/VCFEncoder.java#L346
The VCFCodec does not honor this contract and I get an error when trying to parse the above line:
The GenotypeLikelihoods code is expecting a single "." regardless of the ploidy and the number of alleles.
Expected behaviour
VCFCodec should handle the case of multiple missing GL fields, especially since its the output generated by
VCFEncoder
.Actual behaviour
See above comments
Additional information
This is the code snippet used to generate the entry above
Should output