samtools / htsjdk

A Java API for high-throughput sequencing data (HTS) formats.
http://samtools.github.io/htsjdk/
283 stars 242 forks source link

parsing missing values in VCF info field doesn't work #1228

Open davidbenjamin opened 5 years ago

davidbenjamin commented 5 years ago

Here's the code for CommonInfo::getAttributeAsDoubleList

public List<Double> getAttributeAsDoubleList(String key, Double defaultValue) {
        return getAttributeAsList(key, x -> {
            if (x == null || x == VCFConstants.MISSING_VALUE_v4) {
                return defaultValue;
            } else if (x instanceof Number) {
                return ((Number) x).doubleValue();
            } else {
                return Double.valueOf((String)x); // throws an exception if this isn't a string
            }
        });

Note that when parsing the vcf the x in this lambda has type Object, and therefore testing x == VCFConstants.MISSING_VALUE_v4 doesn't work i.e. it returns false even when the vcf has the missing field indicator "." To return true int his case you would need ((String) x).equals(VCFConstants.MISSING_VALUE_v4).

@lbergelson This is causing a Mutect2 bug when users use their own gnomAD. I have a workaround but it adds extra vcf parsing to isActive.

pshapiro4broad commented 5 years ago

The code uses x == VCFConstants.MISSING_VALUE_v4 in a few places; all of these should instead use x.equals(VCFConstants.MISSING_VALUE_v4). (Casting to String is a runtime error if x is another type and equals() doesn't care if you compare different types.)

There's some code duplication here and cases where it doesn't compare against MISSING_VALUE_v4, it would be nice to clean up this code while doing this. E.g. getAttributeAsDouble() and getAttributeAsDoubleList() have the same code (almost) for converting to double.