samtools / htsjdk

A Java API for high-throughput sequencing data (HTS) formats.
http://samtools.github.io/htsjdk/
283 stars 242 forks source link

Tolerate lower-case nans in QUAL #1364

Closed karenfeng closed 5 years ago

karenfeng commented 5 years ago

Description

We sometimes see VCFs with lower-case nan's in QUAL. This doesn't match the Java-style NaN, and therefore causes VCF parsing to break.

Checklist

codecov-io commented 5 years ago

Codecov Report

Merging #1364 into master will increase coverage by 0.036%. The diff coverage is 89.474%.

@@               Coverage Diff               @@
##              master     #1364       +/-   ##
===============================================
+ Coverage     68.007%   68.043%   +0.036%     
- Complexity      8352      8368       +16     
===============================================
  Files            570       571        +1     
  Lines          33848     33889       +41     
  Branches        5661      5665        +4     
===============================================
+ Hits           23019     23059       +40     
+ Misses          8645      8640        -5     
- Partials        2184      2190        +6
Impacted Files Coverage Δ Complexity Δ
...n/java/htsjdk/variant/variantcontext/Genotype.java 59.036% <0%> (ø) 80 <0> (ø) :arrow_down:
.../htsjdk/variant/variantcontext/VariantContext.java 77.714% <100%> (ø) 246 <0> (ø) :arrow_down:
...dk/variant/variantcontext/GenotypeLikelihoods.java 84.971% <100%> (ø) 74 <0> (ø) :arrow_down:
...main/java/htsjdk/variant/vcf/AbstractVCFCodec.java 76.218% <100%> (+0.573%) 100 <0> (+2) :arrow_up:
src/main/java/htsjdk/variant/vcf/VCFUtils.java 49.275% <100%> (+4.831%) 22 <5> (+5) :arrow_up:
...java/htsjdk/variant/variantcontext/CommonInfo.java 47.863% <50%> (ø) 47 <0> (ø) :arrow_down:
...htsjdk/samtools/util/nio/DeleteOnExitPathHook.java 78.947% <0%> (-10.526%) 3% <0%> (-1%)
...dk/samtools/cram/structure/SubstitutionMatrix.java 88.189% <0%> (-3.564%) 25% <0%> (+10%)
...samtools/cram/structure/CramCompressionRecord.java 92.424% <0%> (-0.545%) 114% <0%> (+1%)
.../samtools/cram/build/CompressionHeaderFactory.java 89.6% <0%> (-0.134%) 59% <0%> (-5%)
... and 4 more
yfarjoun commented 5 years ago

does the spec allow for Nan in the QUAL field?

On Wed, May 8, 2019 at 3:00 PM Karen Feng notifications@github.com wrote:

Description

We sometimes see VCFs with lower-case nan's in QUAL. This doesn't match the Java-style NaN, and therefore causes VCF parsing to break. Checklist

  • Code compiles correctly
  • New tests covering changes and new functionality
  • All tests passing
  • Extended the README / documentation, if necessary
  • Is not backward compatible (breaks binary or source compatibility)

You can view, comment on, or merge this pull request online at:

https://github.com/samtools/htsjdk/pull/1364 Commit Summary

  • Tolerate lower-case nans

File Changes

Patch Links:

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/samtools/htsjdk/pull/1364, or mute the thread https://github.com/notifications/unsubscribe-auth/AAU6JUU4IAD2YZS4UI7HP7DPUMPMZANCNFSM4HLULL3A .

karenfeng commented 5 years ago

VCF reading doesn't break on NaN in the QUAL or GQ fields (as in the associated unit test), but I'm not completely confident on how spec-compliant that is. The spec seems to suggest that because QUAL is a Float, it should be ok; the reason GQ doesn't break on NaN today is because it's parsed via Double rather than Integer.

jmarshall commented 5 years ago

The 4.3 spec is clear that NaNs are valid in QUAL (§1.6.1 states QUAL is a Float; §1.3 says VCF Floats include NaN). The earlier specs don't have §1.3 and appear to be silent on what text a Float might allow.

See also NaN vs. nan discussions in https://github.com/samtools/bcftools/issues/755#issuecomment-455240246 and https://github.com/samtools/hts-specs/issues/89#issuecomment-455503658 and nearby.

karenfeng commented 5 years ago

@lbergelson Thank you for the feedback; let me know if there any further changes you would like me to make.

alecw commented 5 years ago

Hi @lbergelson , is a release with this change coming any time soon? We would really appreciate it.

Thanks, Alec

lbergelson commented 5 years ago

Yes. I've wanted to do a release for a while but I've been busy with personal stuff (mostly baby related) and because of incompatibilities I've had trouble running the downstream tests successfully so it's taken longer than intended. I hope to have one out this week but next week at the latest. I can publish a release candidate that hasn't been fully tested if that would be helpful.