zgrossbart / jdd

A semantic JSON compare tool
http://www.jsondiff.com
Apache License 2.0
1.03k stars 180 forks source link

Does not detect differences for very large integers #45

Closed RadixSeven closed 2 years ago

RadixSeven commented 2 years ago

Comparing

[10176718226607830673]

with

[10176718226607831000]

reports that they are the same.

zgrossbart commented 2 years ago

Thank you for using JSONDiff.

So... this might seem like a really unsatisfying answer, but in JSON [10176718226607830673] is the same as [10176718226607831000].

The maximum value of an int in the JSON spec is a 32-bit signed integer. That value is 2147483647. Many JSON parsers will handle up to the maximum int value in JavaScript which is 9007199254740991. Both of these are lower than your numbers.

In this case your numbers are being parsed up to the max and stopping there. That means the value of parsing them is both the same.

Values like the ones in your example are referred to as big integers. Some languages have a specific Big Integer construct, but JSON does not. The best way to represent big integers in JSON is to make them into strings.

If you compare ["10176718226607830673"] and ["10176718226607831000"] it will give you the behavior that you expect.

zgrossbart commented 2 years ago

You can also see the same behavior in JavaScript by opening your browser console and typing this:

var a = 10176718226607830673;
var b = 10176718226607831000;
a === b
RadixSeven commented 2 years ago

Before I start picking nits below, deciding you won't fix this is fine; my use case is not within the bounds of expected interoperability. I just didn't know if this was something you wanted to fix.

JSON does not have the same limitations as Javascript. So it would not be going against the standard to treat my two examples as different. (ECMA-404) says that any numbers meeting the grammar are acceptable (no limitation on length). A more recent standard (RFC-7159) says that keeping integer values within the range expressable exactly by a double is good for interoperability but not required.

My use case I came across this because a co-worker was reporting a bug, and I couldn't see the difference between two long stretches of JSON, so I pulled up your site and pasted both into the two windows to check, and it said they were identical. I almost wrote a text message to him saying that the two snippets were the same before I thought to check for this issue. (We usually use a JSON parser that handles big integers, but in this case, we happened to be passing the file through jq in a shell script. We turn the integer fields that can exceed 2^53 into strings when we do this. One of the values got mangled by a sed command that didn't catch one of the fields because the upstream code started adding a space between the colon and the integer.)

Quoting the standards

(ECMA-404)

A number is a sequence of decimal digits with no superfluous leading zero. It may have a preceding minus sign (U+002D). It may have a fractional part prefixed by a decimal point (U+002E). It may have an exponent, prefixed by e (U+0065) or E (U+0045) and optionally + (U+002B) or – (U+002D). The digits are the code points U+0030 through U+0039.

(RFC-7159)

6. Numbers

The representation of numbers is similar to that used in most programming languages. A number is represented in base 10 using decimal digits. It contains an integer component that may be prefixed with an optional minus sign, which may be followed by a fraction part and/or an exponent part. Leading zeros are not allowed.

A fraction part is a decimal point followed by one or more digits. An exponent part begins with the letter E in upper or lower case, which may be followed by a plus or minus sign. The E and optional sign are followed by one or more digits.

Numeric values that cannot be represented in the grammar below (such as Infinity and NaN) are not permitted.

  number = [ minus ] int [ frac ] [ exp ]

  decimal-point = %x2E       ; .

  digit1-9 = %x31-39         ; 1-9

  e = %x65 / %x45            ; e E

  exp = e [ minus / plus ] 1*DIGIT

  frac = decimal-point 1*DIGIT

  int = zero / ( digit1-9 *DIGIT )

  minus = %x2D               ; -

  plus = %x2B                ; +

  zero = %x30                ; 0

This specification allows implementations to set limits on the range and precision of numbers accepted. Since software that implements IEEE 754-2008 binary64 (double precision) numbers [IEEE754] is generally available and widely used, good interoperability can be achieved by implementations that expect no more precision or range than these provide, in the sense that implementations will approximate JSON numbers within the expected precision. A JSON number such as 1E400 or 3.141592653589793238462643383279 may indicate potential interoperability problems, since it suggests that the software that created it expects receiving software to have greater capabilities for numeric magnitude and precision than is widely available.

Note that when such software is used, numbers that are integers and are in the range [-(2**53)+1, (2**53)-1] are interoperable in the sense that implementations will agree exactly on their numeric values.

zgrossbart commented 2 years ago

There are numerous references as to the max size of numbers in JSON. This is a good example:

https://developers.google.com/discovery/v1/type-format#:~:text=Defined%20by%20the%20JSON%20Schema%20spec.,-integer&text=A%2032%2Dbit%20signed%20integer,value%20of%202%2C147%2C483%2C647%20(inclusive).

Using numbers larger than this in JSON will produce unpredictable results.

If I did want to support this case I would need to implement a custom big integer object in JavaScript since JSONDiff runs in a browser and that means using JavaScript. This case is outside of the scope of the tool.

I would also strongly recommend that you represent large integers like this as strings in your JSON data since you're likely to run into situations like this again otherwise.