toml-lang / toml-test

A language agnostic test suite for TOML parsers.
MIT License
214 stars 71 forks source link

Add a test for oversized floating-point literals #151

Closed glguy closed 8 months ago

glguy commented 8 months ago

What should 1e1000000000 decode to? Should it be infinity or an error? Having implementations be consistent would be nice. It's possible you can't write a test for this because it could be underspecified.

Incidentally, for (at least) one TOML library (not mine) this kind of test might have even caught an important bug in the implementation. HSEC-2023-0007

My personal preference would be to require users to write inf to get an infinity. Silent overflow behavior seems hostile to users who haven't memorized the maximum value that can be stored in a double

arp242 commented 8 months ago

This is what various parsers do right now, grouped by how they handle it:

% ./run test <<<'a=1e1000000000'
c++-toml++                    ERROR: Error while parsing floating-point: '1e1000000000' could not be interpreted as a value     (error occurred at line 1, column 15 of 'stdin')
c++-toml11                    ERROR: what(): [error] toml::parse_floating: --> unknown file   | 1 | a=1e1000000000   |   ^--- out of range
c-toml-c                      ERROR: unknown type{"a":
c-tomlc99                     ERROR: unknown type{"a":
go-go-toml                    ERROR: Error decoding TOML: toml: unable to parse float: strconv.ParseFloat: parsing "1e1000000000": value out of range
go-toml                       ERROR: Error decoding TOML: toml: line 1 (last key "a"): 1e1000000000 is out of range for float64
lisp-clop                     ERROR: arithmetic error FLOATING-POINT-OVERFLOW signalled
racket-toml-racket            ERROR: loading code: version mismatch  expected: "8.12"  found: "8.11.1"  in: #<input-port:string>  possible solution: running `racket -y`, `rac
rust-basic-toml               ERROR: Error: Error { message: "invalid number at line 1 column 3" }
rust-toml                     ERROR: Error: Error { message: "TOML parse error at line 1, column 3\n  |\n1 | a=1e1000000000\n  |   ^\ninvalid floating-point number\n" }
rust-toml_edit                ERROR: Error: Error { message: "TOML parse error at line 1, column 3\n  |\n1 | a=1e1000000000\n  |   ^\ninvalid floating-point number\n" }

cs-tomlyn                     {"a":{"type":"float","value":"+inf"}}
dart-toml.dart                {"a":{"type":"float","value":"infinity"}}
fortran-toml-f                {"a":{"type":"float","value":"+inf"}}
haskell-toml-parser           {"a":{"type":"float","value":"inf"}}
haskell-toml-reader           {"a":{"type":"float","value":"inf"}}
js-iarna-toml                 {"a":{"type":"float","value":"Infinity"}}
js-j-toml                     {"a":{"type":"float","value":"Infinity"}}
js-smol-toml                  {"a":{"type":"float","value":"inf"}}
js-toml-eslint-parser         {"a":{"type":"float","value":"+inf"}}
python-toml                   {"a":{"type":"float","value":"inf"}}
python-tomli                  {"a":{"type":"float","value":"inf"}}
python-tomlkit                {"a":{"type":"float","value":"inf"}}
python-tomllib                {"a":{"type":"float","value":"inf"}}
ruby-perfect_toml             {"a":{"type":"float","value":"+inf"}}
ruby-toml-rb                  {"a":{"type":"float","value":"+inf"}}
ruby-tomlrb                   {"a":{"type":"float","value":"+inf"}}
rust-taplo                    {"a":{"type":"float","value":"inf"}}

ocaml-otoml                   {"a":{"type":"float","value":"1e+1000000000"}}
guile-toml                    {"a":{"type":"float","value":"1e1000000000"}}

11 error out, 17 use inf, and 2 seem to handle this (although I didn't verify if they also handle it correct). I think both erroring and using Inf is reasonable.

As I understand that issue, Numeric.readFloat "1e1000000000" taking 35 seconds seems very specific to how Haskell works(?)

This isn't really specified in the TOML spec at the moment; I'm not sure we need to either: practically speaking, it also seems to be a non-issue, and at least allowing support for floats >64bit isn't a bad thing?

Can add a note to: https://github.com/toml-lang/toml-test?tab=readme-ov-file#untested-and-implementation-defined-behaviour ?

glguy commented 8 months ago

Thanks for putting that summary together so quickly.

You're right that the bad behavior on overflow in toml-reader was specifically to a GHC base library behavior, and isn't likely to be duplicated across languages. I only included in the ticket because it was somewhat connected.

I think allowing floats >64-bits is probably a bad thing. Uncertainty about what values are in range leads to incompatibility between TOML implementations.

Floats should be implemented as IEEE 754 binary64 values.

I think it's a good thing that the spec gives us this kind of clarity about floats in TOML and I'm disappointed in JSON, for example, for punting on this.

Arbitrary 64-bit signed integers (from −2^63 to 2^63−1) should be accepted and handled losslessly. If an integer cannot be represented losslessly, an error must be thrown.

TOML says that for integers a lossy parse should through an error and at least defines a minimum acceptable range. I think it's great that it defines both a minimum and an error behavior. The fact that lossy conversions are defined to error suggests to me that it's the right answer for TOML floats but that's just an interpretation.

Being explicit that this behavior isn't tested sounds like a fine resolution to me.

(I'm quite aware that the opinions I've expressed are opinions and that this repository isn't the place to define new behaviors of TOML. I've just been leaning on it to have good tests for as many defined behaviors as possible :-D )

arp242 commented 8 months ago

"Should" is a bit of an ambiguous term, but the way I'm reading that doesn't exclude being able to handle higher precision floats, or even supporting only 32bit floats.

As you mentioned, this is more of a specification issue rather than a toml-test issue. Integers specified much more precisely:

32-bit signed integers (from −2^31 to 2^31−1) must be accepted and handled losslessly. 64-bit signed integers (from −2^63 to 2^63−1) should be accepted and handled losslessly. A larger range may also be accepted.

An error must be thrown if a given integer cannot be represented losslessly.

But of course, floats are never "lossless" in the same way ints are.

At the very least the specification should be changed to define a minimally supported range similar to ints, so you know what you can rely on with 100% certainty.

glguy commented 8 months ago

Since my request is impossible to satisfy (the behavior isn't defined enough to write a test) I'll go ahead and retract it. Thanks for the discussion.