json5 / json5-spec

The JSON5 Data Interchange Format
https://spec.json5.org
MIT License
49 stars 11 forks source link

Proposal: Define 32-bit, 64-bit and larger floats/numbers well #19

Closed DonaldTsang closed 4 years ago

DonaldTsang commented 4 years ago

A friend of mine mentioned one issue: Is it possible to set rules on floats and numbers such that it is made to be compatible to either JS/Go/Python float/double or BigInt? Or if we are doing semi-arbitrary floating point precision, define how many digits are required to make sure there are no loss of information? Maybe decimal vs octal/hexadecimal should also be considered when dealing with precision?

RFC-7159

   Since software that implements
   IEEE 754-2008 binary64 (double precision) numbers [IEEE754] is
   generally available and widely used, good interoperability can be
   achieved by implementations that expect no more precision or range
   than these provide, in the sense that implementations will
   approximate JSON numbers within the expected precision.  A JSON
   number such as 1E400 or 3.141592653589793238462643383279 may indicate
   potential interoperability problems, since it suggests that the
   software that created it expects receiving software to have greater
   capabilities for numeric magnitude and precision than is widely
   available.

Maybe define that all floats must have a decimal point and all integers to have NO decimal point such that we don't need YAML-like !!float vs !!int

jordanbtucker commented 4 years ago

The JSON and JSON5 specs intentionally define numbers in a generic way. There is no requirement that numbers be IEEE-754 single or double precision floating point numbers. There is also no requirement that integers be 32 or 64 bit. If two parties need to agree on a certain restrictions for numbers, that can be done via a schema or by documentation. Adding the ability to define these restrictions in the JSON5 syntax just makes the syntax more complex without sufficiently offsetting the cost.

DonaldTsang commented 4 years ago

@jordanbtucker perhaps it is possible to add a header to note that it is a Single/Double (IEEE-754), BigInt (assume that all numbers are bigInt) or Arbitrary-Precision Number? That way it can make things easier to parse. If someone were to send a float that is abnormally long it might be good to make sure that the parser can accept the number without data loss.

jordanbtucker commented 4 years ago

This is not the job of the syntax. This is handled by the schema or contract between applications.

DonaldTsang commented 4 years ago

But isn't the schema what the syntax is based on? In order for languages to interact with JSON5 "safely" it should have some baseline rule on when a number is breaking standards?

jordanbtucker commented 4 years ago

No, the schema further restricts the syntax and structure of a JSON5 document. Parsers can impose restrictions on how it handles numbers. This makes the syntax loose enough to handle many different types of numbers at the expense of losing automatic interoperability.

DonaldTsang commented 4 years ago

So is it possible to have Parser rule enforcement, at least for JSON5's main repo such that we can keep it "in line"?

jordanbtucker commented 4 years ago

JavaScript only handles IEEE-754 numbers, so all JSON5 numbers conform to that specification. Numbers that are too large or too small lose accuracy. What rule needs to be enforced?

DonaldTsang commented 4 years ago

@jordanbtucker JSON5 is meant for all languages (theoretically) so can IEEE-754 numbers rules be applied to major parsers within JSON5? When I say rules regarding IEEE-754 I meant some people who tries to break the system by giving a float 100 decimal digits or something silly like that.

jordanbtucker commented 4 years ago

If parsers want to restrict numbers to IEEE-754 representations, then they may do so. The official reference parser does this because that's what JavaScript supports. However, the spec does not enforce this restriction. This allows for JSON5 processing applications that can agree upon representations beyond IEEE-754, like arbitrarily large numbers, to communicate.