separate unsigned integer decoding to support full range of uint64

pelletier / go-toml

Go library for the TOML file format

https://github.com/pelletier/go-toml

Other

1.7k stars 209 forks source link

separate unsigned integer decoding to support full range of uint64 #782

Closed jmank88 closed 2 years ago

jmank88 commented 2 years ago

Drafted a quick fix for #781

Happy to refactor, add tests, etc.

pelletier commented 2 years ago

Thank you for submitting the PR! I think it's directionally right, but I'd love to see if we can refactor it.

Maybe having a parseInteger function return the cleaned up bytes + base (like (data []byte, int base), and have unmarshalInteger pick which strconv.Parse* function to call.

An other option is to pass a boolean to parseInteger to tell it whether to use signed or unsigned.

For performance I think the latter is better (because it allows to ultimately remove the underscores cleaning that allocates), but the first one is probably easier to reason about. If you're willing to update the PR I'm happy with either; we can always revisit it later. As for testing, as long as the coverage / report action is happy, I'm happy :)

jmank88 commented 2 years ago

I thought the latter sounded better initially too, but it seems to require returning the two different types back up the stack still. I went with a version of the former, which actually ended up enabling further simplification, but now I've simplified away errors that were tested, so might need to dial it back... :thinking:

pelletier commented 2 years ago

Thinking more about it, I remembered that the TOML spec only handles int64 numbers:

https://github.com/toml-lang/toml/blob/cbf3b13128fb717517afbd22f8fcb665a0b0b035/toml.md?plain=1#L459-L461

I think that's why I initially just went with strconv.ParseInt, even for unsigned numbers. Did you encounter this issue in the wild or were you testing specifically for overflows?

jmank88 commented 2 years ago

Oh, curious.

Did you encounter this issue in the wild or were you testing specifically for overflows?

We are converting from a legacy config which contains uint64 fields (as well as big.Int - which I guess is not actually supported natively either :thinking: ). Regardless of whether or not this is a fundamental problem, it was the asymmetry of being able to Encode/Marshal large values that cannot be Decoded/Unmarshaled that brought this to our attention. Should the encoder be limited to producing valid toml? Or is that on the users?

moorereason commented 2 years ago

Much discussion about this in https://github.com/toml-lang/toml/issues/538.

As I understand it: implementations MAY support integers larger than int64 but can't guarantee that other implementations will support anything beyond int64. It would be valid but irregular TOML.

pelletier commented 2 years ago

Sorry for the slow response. I'm trying to think this through. I am a bit hesitant to support integers outside int64, as that's what the spec describes. The reason for it is that by deviating too much from the specification, people using go-toml just fine for some files may be locked into using it without realizing it – meaning the data is no longer portable, because other libraries may have different behavior for numbers greater than max int64. For folks who expect larger numbers, a type implementing encoding.TextMarshaler may be better suited – and actually portable. If we go down that route, I think @jmank88 is right: the library should prevent the user from serializing a uint64 greater than max int64, so that there is symmetry between encoding and decoding.

jmank88 commented 2 years ago

No problem. We came to similar a conclusion, and have accepted that we'll have to use quoted strings for large ints and high precision floats. Closing this since a pivot to limiting encoded values instead would be a fundamentally different PR.