Closed jmank88 closed 2 years ago
Thank you for submitting the PR! I think it's directionally right, but I'd love to see if we can refactor it.
Maybe having a parseInteger
function return the cleaned up bytes + base (like (data []byte, int base)
, and have unmarshalInteger
pick which strconv.Parse*
function to call.
An other option is to pass a boolean to parseInteger
to tell it whether to use signed or unsigned.
For performance I think the latter is better (because it allows to ultimately remove the underscores cleaning that allocates), but the first one is probably easier to reason about. If you're willing to update the PR I'm happy with either; we can always revisit it later. As for testing, as long as the coverage / report
action is happy, I'm happy :)
I thought the latter sounded better initially too, but it seems to require returning the two different types back up the stack still. I went with a version of the former, which actually ended up enabling further simplification, but now I've simplified away errors that were tested, so might need to dial it back... :thinking:
Thinking more about it, I remembered that the TOML spec only handles int64 numbers:
I think that's why I initially just went with strconv.ParseInt
, even for unsigned numbers. Did you encounter this issue in the wild or were you testing specifically for overflows?
Oh, curious.
Did you encounter this issue in the wild or were you testing specifically for overflows?
We are converting from a legacy config which contains uint64
fields (as well as big.Int
- which I guess is not actually supported natively either :thinking: ). Regardless of whether or not this is a fundamental problem, it was the asymmetry of being able to Encode/Marshal large values that cannot be Decoded/Unmarshaled that brought this to our attention. Should the encoder be limited to producing valid toml? Or is that on the users?
Much discussion about this in https://github.com/toml-lang/toml/issues/538.
As I understand it: implementations MAY support integers larger than int64 but can't guarantee that other implementations will support anything beyond int64. It would be valid but irregular TOML.
Sorry for the slow response. I'm trying to think this through. I am a bit hesitant to support integers outside int64, as that's what the spec describes. The reason for it is that by deviating too much from the specification, people using go-toml just fine for some files may be locked into using it without realizing it – meaning the data is no longer portable, because other libraries may have different behavior for numbers greater than max int64. For folks who expect larger numbers, a type implementing encoding.TextMarshaler
may be better suited – and actually portable. If we go down that route, I think @jmank88 is right: the library should prevent the user from serializing a uint64 greater than max int64, so that there is symmetry between encoding and decoding.
No problem. We came to similar a conclusion, and have accepted that we'll have to use quoted strings for large ints and high precision floats. Closing this since a pivot to limiting encoded values instead would be a fundamentally different PR.
Drafted a quick fix for #781
Happy to refactor, add tests, etc.