Closed glguy closed 8 months ago
This is what various parsers do right now, grouped by how they handle it:
% ./run test <<<'a=1e1000000000'
c++-toml++ ERROR: Error while parsing floating-point: '1e1000000000' could not be interpreted as a value (error occurred at line 1, column 15 of 'stdin')
c++-toml11 ERROR: what(): [error] toml::parse_floating: --> unknown file | 1 | a=1e1000000000 | ^--- out of range
c-toml-c ERROR: unknown type{"a":
c-tomlc99 ERROR: unknown type{"a":
go-go-toml ERROR: Error decoding TOML: toml: unable to parse float: strconv.ParseFloat: parsing "1e1000000000": value out of range
go-toml ERROR: Error decoding TOML: toml: line 1 (last key "a"): 1e1000000000 is out of range for float64
lisp-clop ERROR: arithmetic error FLOATING-POINT-OVERFLOW signalled
racket-toml-racket ERROR: loading code: version mismatch expected: "8.12" found: "8.11.1" in: #<input-port:string> possible solution: running `racket -y`, `rac
rust-basic-toml ERROR: Error: Error { message: "invalid number at line 1 column 3" }
rust-toml ERROR: Error: Error { message: "TOML parse error at line 1, column 3\n |\n1 | a=1e1000000000\n | ^\ninvalid floating-point number\n" }
rust-toml_edit ERROR: Error: Error { message: "TOML parse error at line 1, column 3\n |\n1 | a=1e1000000000\n | ^\ninvalid floating-point number\n" }
cs-tomlyn {"a":{"type":"float","value":"+inf"}}
dart-toml.dart {"a":{"type":"float","value":"infinity"}}
fortran-toml-f {"a":{"type":"float","value":"+inf"}}
haskell-toml-parser {"a":{"type":"float","value":"inf"}}
haskell-toml-reader {"a":{"type":"float","value":"inf"}}
js-iarna-toml {"a":{"type":"float","value":"Infinity"}}
js-j-toml {"a":{"type":"float","value":"Infinity"}}
js-smol-toml {"a":{"type":"float","value":"inf"}}
js-toml-eslint-parser {"a":{"type":"float","value":"+inf"}}
python-toml {"a":{"type":"float","value":"inf"}}
python-tomli {"a":{"type":"float","value":"inf"}}
python-tomlkit {"a":{"type":"float","value":"inf"}}
python-tomllib {"a":{"type":"float","value":"inf"}}
ruby-perfect_toml {"a":{"type":"float","value":"+inf"}}
ruby-toml-rb {"a":{"type":"float","value":"+inf"}}
ruby-tomlrb {"a":{"type":"float","value":"+inf"}}
rust-taplo {"a":{"type":"float","value":"inf"}}
ocaml-otoml {"a":{"type":"float","value":"1e+1000000000"}}
guile-toml {"a":{"type":"float","value":"1e1000000000"}}
11 error out, 17 use inf, and 2 seem to handle this (although I didn't verify if they also handle it correct). I think both erroring and using Inf is reasonable.
As I understand that issue, Numeric.readFloat "1e1000000000"
taking 35 seconds seems very specific to how Haskell works(?)
This isn't really specified in the TOML spec at the moment; I'm not sure we need to either: practically speaking, it also seems to be a non-issue, and at least allowing support for floats >64bit isn't a bad thing?
Can add a note to: https://github.com/toml-lang/toml-test?tab=readme-ov-file#untested-and-implementation-defined-behaviour ?
Thanks for putting that summary together so quickly.
You're right that the bad behavior on overflow in toml-reader
was specifically to a GHC base library behavior, and isn't likely to be duplicated across languages. I only included in the ticket because it was somewhat connected.
I think allowing floats >64-bits is probably a bad thing. Uncertainty about what values are in range leads to incompatibility between TOML implementations.
Floats should be implemented as IEEE 754 binary64 values.
I think it's a good thing that the spec gives us this kind of clarity about floats in TOML and I'm disappointed in JSON, for example, for punting on this.
Arbitrary 64-bit signed integers (from −2^63 to 2^63−1) should be accepted and handled losslessly. If an integer cannot be represented losslessly, an error must be thrown.
TOML says that for integers a lossy parse should through an error and at least defines a minimum acceptable range. I think it's great that it defines both a minimum and an error behavior. The fact that lossy conversions are defined to error suggests to me that it's the right answer for TOML floats but that's just an interpretation.
Being explicit that this behavior isn't tested sounds like a fine resolution to me.
(I'm quite aware that the opinions I've expressed are opinions and that this repository isn't the place to define new behaviors of TOML. I've just been leaning on it to have good tests for as many defined behaviors as possible :-D )
"Should" is a bit of an ambiguous term, but the way I'm reading that doesn't exclude being able to handle higher precision floats, or even supporting only 32bit floats.
As you mentioned, this is more of a specification issue rather than a toml-test issue. Integers specified much more precisely:
32-bit signed integers (from −2^31 to 2^31−1) must be accepted and handled losslessly. 64-bit signed integers (from −2^63 to 2^63−1) should be accepted and handled losslessly. A larger range may also be accepted.
An error must be thrown if a given integer cannot be represented losslessly.
But of course, floats are never "lossless" in the same way ints are.
At the very least the specification should be changed to define a minimally supported range similar to ints, so you know what you can rely on with 100% certainty.
Since my request is impossible to satisfy (the behavior isn't defined enough to write a test) I'll go ahead and retract it. Thanks for the discussion.
What should
1e1000000000
decode to? Should it be infinity or an error? Having implementations be consistent would be nice. It's possible you can't write a test for this because it could be underspecified.Incidentally, for (at least) one TOML library (not mine) this kind of test might have even caught an important bug in the implementation. HSEC-2023-0007
My personal preference would be to require users to write
inf
to get an infinity. Silent overflow behavior seems hostile to users who haven't memorized the maximum value that can be stored in adouble