samtools / bcftools

This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
http://samtools.github.io/bcftools/
Other
662 stars 240 forks source link

preserve decimal point in float INFO fields #980

Open pontikos opened 5 years ago

pontikos commented 5 years ago

INFO fields of type float should have a decimal point even if the number has trailing zeroes I.e 70.0 instead of 70. Rounding to an integer breaks GATK.

jkbonfield commented 5 years ago

This has come up before, although I'm struggling to find the issue. Maybe it was over in htsjdk land.

Anyway, this is a parsing bug in GATK, not in bcftools output. Floating point numbers are a superset of integers. "70" is still a valid floating point number and C "atof" and "strtod" functions quite happily accept whole numbers.

While I guess we could change all floating point numbers to include .0 if they are whole numbers, it needlessly wastes space and isn't the correct solution.

pontikos commented 5 years ago

Ok I've posted on GATK github:

https://github.com/broadinstitute/gatk/issues/5789

I agree that it seems silly that GATK falls over when a decimal point is missing for a float.

I hope htsjdk (assuming that's what GATK are using) and htslib can agree on this.

pd3 commented 5 years ago

Yes, this is a silly bug in GATK and we will not address this in bcftools / htslib. As a workaround, you can "fix" the numbers to GATK's liking using this script https://github.com/samtools/bcftools/blob/develop/misc/fix-broken-GATK-Double-vs-Integer

pontikos commented 5 years ago

Thanks! I also wrote a script to fix it. GATK don't want to fix it as GATK 3 is no longer maintained. If you are maybe able to point to the line of code that does this in bcftools I can fix this in my version.

On Wed, 13 Mar 2019, 09:12 Petr Danecek, notifications@github.com wrote:

Yes, this is a silly bug in GATK and we will not address this in bcftools / htslib. As a workaround, you can "fix" the numbers to GATK's liking using this script https://github.com/samtools/bcftools/blob/develop/misc/fix-broken-GATK-Double-vs-Integer

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/samtools/bcftools/issues/980#issuecomment-472339447, or mute the thread https://github.com/notifications/unsubscribe-auth/ADrG9HFMcftw085DHATaTuk5CbQyXb7Tks5vWMEXgaJpZM4brghN .

jkbonfield commented 5 years ago

It's probably kputd in kstring.c. This uses %g to print up floats if very large or very small, or otherwise emulates the printf %g format itself. The z[-1] = 0 line MAY be responsible along with some editing to the trailing zero removal, but you'll need to experiment. Note though this is just following normal printing mechanism. Eg try printf on the command line:

jkb$ printf "%g\n" 0.170
0.17
jkb$ printf "%g\n" 1.70
1.7
jkb$ printf "%g\n" 17.0
17

"17", not "17.0"!