mozilla-services / canonicaljson-rs

Rust implementation of a Canonical JSON serializer
MIT License
5 stars 2 forks source link

This library does not follow the spec for numbers or Unicode #9

Open john-shaffer opened 2 years ago

john-shaffer commented 2 years ago

According to the readme,

This library follows gibson's Canonical JSON spec.

However, the implementation does not follow the spec. Numbers are serialized very differently. For instance, canonicaljson-rs serializes 1E-2 where the spec requires 1.0E-2. Unicode handling is inverted. canonicaljson-rs escapes most unicode characters, but the spec requires "avoiding escape sequences for characters except those otherwise inexpressible in JSON".

The list of failures on the canonicaljson-spec test suite:

malformed/empty OK
malformed/hex_number OK
malformed/invalid_string_character OK
malformed/invalid_string_escape OK
malformed/invalid_string_unicode_escape OK
malformed/leading_plus_number OK
malformed/leading_zero_number OK
malformed/missing_array_element OK
malformed/missing_integer_number OK
malformed/missing_object_colon OK
malformed/missing_object_element OK
malformed/partial_fraction_number OK
malformed/unclosed_array OK
malformed/unclosed_object OK
malformed/unclosed_string OK
malformed/unopened_array OK
malformed/unopened_object OK
malformed/unopened_string OK
Files - and ./test/tokens/3.object-ordering/expected.json differ
--- rDKPEBnY.test-tokens.sh/output.pretty.json  2022-07-26 13:16:28.341285142 -0500
+++ rDKPEBnY.test-tokens.sh/expected.pretty.json    2022-07-26 13:16:28.341285142 -0500
@@ -1 +1,2 @@
-
+{
+  "": "empty",
Files - and ./test/tokens/4.integer/1.no-negative-zero/expected.json differ
--- rDKPEBnY.test-tokens.sh/output.pretty.json  2022-07-26 13:16:28.405285996 -0500
+++ rDKPEBnY.test-tokens.sh/expected.pretty.json    2022-07-26 13:16:28.405285996 -0500
@@ -1,78 +1,78 @@
 [
   "for sig in 0 0.0 0.00; do for e in '' e E; do [ x$e = x ] && echo $sig, && continue; for e_sign in '' '-' '+'; do for exp in 0 00 1 01; do echo $sig$e$e_sign$exp,; done; done; done; done",
   0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0,
-  0E0
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0,
+  0
 ]
Files - and ./test/tokens/4.integer/2.no-decimal-point/expected.json differ
--- rDKPEBnY.test-tokens.sh/output.pretty.json  2022-07-26 13:16:28.437286421 -0500
+++ rDKPEBnY.test-tokens.sh/expected.pretty.json    2022-07-26 13:16:28.437286421 -0500
@@ -1 +1,24 @@
-
+[
+  0,
+  0,
+  4,
+  4,
+  42,
+  42,
+  42,
+  42,
+  8,
+  8,
+  179769313486231590772930519078902473361797697894230657273430081157732675805500963132708477322407536021120113879871393357658789768814416622492847430639474124377767893424865485276302219601246094119453082952085005768838150682342462881473913110540827237163350510684586298239947245938479716304835356329624224137217,
+  0,
+  0,
+  -4,
+  -4,
+  -42,
+  -42,
+  -42,
+  -42,
+  -8,
+  -8,
+  -179769313486231590772930519078902473361797697894230657273430081157732675805500963132708477322407536021120113879871393357658789768814416622492847430639474124377767893424865485276302219601246094119453082952085005768838150682342462881473913110540827237163350510684586298239947245938479716304835356329624224137217
+]
Files - and ./test/tokens/4.integer/3.no-exponent/expected.json differ
--- rDKPEBnY.test-tokens.sh/output.pretty.json  2022-07-26 13:16:28.441286475 -0500
+++ rDKPEBnY.test-tokens.sh/expected.pretty.json    2022-07-26 13:16:28.441286475 -0500
@@ -1 +1,74 @@
-
+[
+  "2^8 +/- 1",
+  255,
+  256,
+  257,
+  "2^16 +/- 1",
+  65535,
+  65536,
+  65537,
+  "2^32 +/- 1",
+  4294967295,
+  4294967296,
+  4294967297,
+  "2^53 +/- 1",
+  9007199254740991,
+  9007199254740992,
+  9007199254740993,
+  "2^64 +/- 1",
+  18446744073709551615,
+  18446744073709551616,
+  18446744073709551617,
+  "2^128 +/- 1",
+  340282366920938463463374607431768211455,
+  340282366920938463463374607431768211456,
+  340282366920938463463374607431768211457,
+  "2^256 +/- 1",
+  115792089237316195423570985008687907853269984665640564039457584007913129639935,
+  115792089237316195423570985008687907853269984665640564039457584007913129639936,
+  115792089237316195423570985008687907853269984665640564039457584007913129639937,
+  "10^100 +/- 1",
+  9999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999,
+  10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001,
+  "2^1024 +/- 1",
+  179769313486231590772930519078902473361797697894230657273430081157732675805500963132708477322407536021120113879871393357658789768814416622492847430639474124377767893424865485276302219601246094119453082952085005768838150682342462881473913110540827237163350510684586298239947245938479716304835356329624224137215,
+  179769313486231590772930519078902473361797697894230657273430081157732675805500963132708477322407536021120113879871393357658789768814416622492847430639474124377767893424865485276302219601246094119453082952085005768838150682342462881473913110540827237163350510684586298239947245938479716304835356329624224137216,
+  179769313486231590772930519078902473361797697894230657273430081157732675805500963132708477322407536021120113879871393357658789768814416622492847430639474124377767893424865485276302219601246094119453082952085005768838150682342462881473913110540827237163350510684586298239947245938479716304835356329624224137217,
+  "-2^8 +/- 1",
+  -255,
+  -256,
+  -257,
+  "-2^16 +/- 1",
+  -65535,
+  -65536,
+  -65537,
+  "-2^32 +/- 1",
+  -4294967295,
+  -4294967296,
+  -4294967297,
+  "-2^53 +/- 1",
+  -9007199254740991,
+  -9007199254740992,
+  -9007199254740993,
+  "-2^64 +/- 1",
+  -18446744073709551615,
+  -18446744073709551616,
+  -18446744073709551617,
+  "-2^128 +/- 1",
+  -340282366920938463463374607431768211455,
+  -340282366920938463463374607431768211456,
+  -340282366920938463463374607431768211457,
+  "-2^256 +/- 1",
+  -115792089237316195423570985008687907853269984665640564039457584007913129639935,
+  -115792089237316195423570985008687907853269984665640564039457584007913129639936,
+  -115792089237316195423570985008687907853269984665640564039457584007913129639937,
+  "-10^100 +/- 1",
+  -9999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999,
+  -10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001,
+  "-2^1024 +/- 1",
+  -179769313486231590772930519078902473361797697894230657273430081157732675805500963132708477322407536021120113879871393357658789768814416622492847430639474124377767893424865485276302219601246094119453082952085005768838150682342462881473913110540827237163350510684586298239947245938479716304835356329624224137215,
+  -179769313486231590772930519078902473361797697894230657273430081157732675805500963132708477322407536021120113879871393357658789768814416622492847430639474124377767893424865485276302219601246094119453082952085005768838150682342462881473913110540827237163350510684586298239947245938479716304835356329624224137216,
+  -179769313486231590772930519078902473361797697894230657273430081157732675805500963132708477322407536021120113879871393357658789768814416622492847430639474124377767893424865485276302219601246094119453082952085005768838150682342462881473913110540827237163350510684586298239947245938479716304835356329624224137217,
+  "3/4",
+  7.5E-1
+]
Files - and ./test/tokens/4.integer/5.large-exponent/expected.json differ
--- rDKPEBnY.test-tokens.sh/output.pretty.json  2022-07-26 13:16:28.441286475 -0500
+++ rDKPEBnY.test-tokens.sh/expected.pretty.json    2022-07-26 13:16:28.441286475 -0500
@@ -1 +1,15 @@
-
+[
+  12345678900000000000000000000000000,
+  100000000000000000000000000000,
+  1.0E30,
+  -100000000000000000000000000000,
+  -1.0E30,
+  2900000000000000000000000000000,
+  2.9E31,
+  -2900000000000000000000000000000,
+  -2.9E31,
+  1.0E100,
+  -1.0E100,
+  1.0E10000000,
+  -1.0E10000000
+]
Files - and ./test/tokens/5.non-integer/1.single-digit-nonzero-significand-integer/expected.json differ
--- rDKPEBnY.test-tokens.sh/output.pretty.json  2022-07-26 13:16:28.457286687 -0500
+++ rDKPEBnY.test-tokens.sh/expected.pretty.json    2022-07-26 13:16:28.457286687 -0500
@@ -4,13 +4,13 @@
   3.14E0,
   3.14E0,
   1.1E-2,
-  1.797693134862316E307,
-  1.797693134862316E307,
+  1.79769313486231590772930519078902473361797697894230657273430081157732675805500963132708477322407536021120113879871393357658789768814416622492847430639474124377767893424865485276302219601246094119453082952085005768838150682342462881473913110540827237163350510684586298239947245938479716304835356329624224137217E307,
+  1.79769313486231590772930519078902473361797697894230657273430081157732675805500963132708477322407536021120113879871393357658789768814416622492847430639474124377767893424865485276302219601246094119453082952085005768838150682342462881473913110540827237163350510684586298239947245938479716304835356329624224137217E307,
   -3.14E0,
   -3.14E0,
   -3.14E0,
   -3.14E0,
   -1.1E-2,
-  -1.797693134862316E307,
-  -1.797693134862316E307
+  -1.79769313486231590772930519078902473361797697894230657273430081157732675805500963132708477322407536021120113879871393357658789768814416622492847430639474124377767893424865485276302219601246094119453082952085005768838150682342462881473913110540827237163350510684586298239947245938479716304835356329624224137217E307,
+  -1.79769313486231590772930519078902473361797697894230657273430081157732675805500963132708477322407536021120113879871393357658789768814416622492847430639474124377767893424865485276302219601246094119453082952085005768838150682342462881473913110540827237163350510684586298239947245938479716304835356329624224137217E307
 ]
Files - and ./test/tokens/5.non-integer/2.nonempty-significand-fraction/expected.json differ
--- rDKPEBnY.test-tokens.sh/output.pretty.json  2022-07-26 13:16:28.461286741 -0500
+++ rDKPEBnY.test-tokens.sh/expected.pretty.json    2022-07-26 13:16:28.461286741 -0500
@@ -1,6 +1,6 @@
 [
-  0E0,
-  9.9E-100,
-  -0E0,
-  -9.9E-100
+  1.0E-1000,
+  9.900000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000099E-100,
+  -1.0E-1000,
+  -9.900000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000099E-100
 ]
Files - and ./test/tokens/5.non-integer/3.no-significand-fraction-trailing-zeroes/expected.json differ
--- rDKPEBnY.test-tokens.sh/output.pretty.json  2022-07-26 13:16:28.477286952 -0500
+++ rDKPEBnY.test-tokens.sh/expected.pretty.json    2022-07-26 13:16:28.477286952 -0500
@@ -3,10 +3,10 @@
   1.4E0,
   1.4E0,
   1.4E0,
-  5E-4,
+  5.0E-4,
   -1.4E0,
   -1.4E0,
   -1.4E0,
   -1.4E0,
-  -5E-4
+  -5.0E-4
 ]
Files - and ./test/tokens/5.non-integer/4.capital-E/expected.json differ
--- rDKPEBnY.test-tokens.sh/output.pretty.json  2022-07-26 13:16:28.489287112 -0500
+++ rDKPEBnY.test-tokens.sh/expected.pretty.json    2022-07-26 13:16:28.489287112 -0500
@@ -1,12 +1,12 @@
 [
-  1E-2,
-  1E-1,
-  1E-100,
-  1.0000000000000001E-98,
-  1E-100,
-  -1E-2,
-  -1E-1,
-  -1E-100,
-  -1.0000000000000001E-98,
-  -1E-100
+  1.0E-2,
+  1.0E-1,
+  1.0E-100,
+  1.0E-98,
+  1.0E-100,
+  -1.0E-2,
+  -1.0E-1,
+  -1.0E-100,
+  -1.0E-98,
+  -1.0E-100
 ]
tokens/5.non-integer/5.no-exponent-plus OK
Files - and ./test/tokens/5.non-integer/6.no-exponent-leading-zeroes/expected.json differ
--- rDKPEBnY.test-tokens.sh/output.pretty.json  2022-07-26 13:16:28.501287272 -0500
+++ rDKPEBnY.test-tokens.sh/expected.pretty.json    2022-07-26 13:16:28.501287272 -0500
@@ -1,8 +1,8 @@
 [
   5.6E0,
   5.60000006E6,
-  0E0,
+  5.6E-1000,
   -5.6E0,
   -5.60000006E6,
-  -0E0
+  -5.6E-1000
 ]
Files - and ./test/tokens/6.string/1.no-unnecessary-escapes/expected.json differ
--- rDKPEBnY.test-tokens.sh/output.pretty.json  2022-07-26 13:16:28.505287326 -0500
+++ rDKPEBnY.test-tokens.sh/expected.pretty.json    2022-07-26 13:16:28.505287326 -0500
@@ -2,3 +2,61 @@
   " !#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~ printable ASCII",
   "  U+0020 SPACE",
   "A U+0041 LATIN CAPITAL LETTER A",
+  " U+007F DELETE",
+  "€ U+0080 PADDING CHARACTER",
+  "Å U+00C5 LATIN CAPITAL LETTER A WITH RING ABOVE",
+  "Å composition—U+0041 LATIN CAPITAL LETTER A + U+030A COMBINING RING ABOVE",
+  "Å U+212B ANGSTROM SIGN",
+  "ế U+1EBF LATIN SMALL LETTER E WITH CIRCUMFLEX AND ACUTE",
+  "ế composition—U+00EA LATIN SMALL LETTER E WITH CIRCUMFLEX + U+0301 COMBINING ACUTE ACCENT",
+  "ế composition—U+0065 LATIN SMALL LETTER E + U+0302 COMBINING CIRCUMFLEX ACCENT + U+0301 COMBINING ACUTE ACCENT",
+  "é̂ composition—U+0065 LATIN SMALL LETTER E + U+0301 COMBINING ACUTE ACCENT + U+0302 COMBINING CIRCUMFLEX ACCENT",
+  "← U+2190 LEFTWARDS ARROW",
+  "fi U+FB01 LATIN SMALL LIGATURE FI",
+  "𝌆 surrogate pair—U+1D306 TETRAGRAM FOR CENTRE",
+  {
+    " !#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~": "printable ASCII"
+  },
+  {
+    " ": "U+0020 SPACE"
+  },
+  {
+    "A": "U+0041 LATIN CAPITAL LETTER A"
+  },
+  {
+    "": "U+007F DELETE"
+  },
+  {
+    "€": "U+0080 PADDING CHARACTER"
+  },
+  {
+    "Å": "U+00C5 LATIN CAPITAL LETTER A WITH RING ABOVE"
+  },
+  {
+    "Å": "composition—U+0041 LATIN CAPITAL LETTER A + U+030A COMBINING RING ABOVE"
+  },
+  {
+    "Å": "U+212B ANGSTROM SIGN"
+  },
+  {
+    "ế": "U+1EBF LATIN SMALL LETTER E WITH CIRCUMFLEX AND ACUTE"
+  },
+  {
+    "ế": "composition—U+00EA LATIN SMALL LETTER E WITH CIRCUMFLEX + U+0301 COMBINING ACUTE ACCENT"
+  },
+  {
+    "ế": "composition—U+0065 LATIN SMALL LETTER E + U+0302 COMBINING CIRCUMFLEX ACCENT + U+0301 COMBINING ACUTE ACCENT"
+  },
+  {
+    "é̂": "composition—U+0065 LATIN SMALL LETTER E + U+0301 COMBINING ACUTE ACCENT + U+0302 COMBINING CIRCUMFLEX ACCENT"
+  },
+  {
+    "←": "U+2190 LEFTWARDS ARROW"
+  },
+  {
+    "fi": "U+FB01 LATIN SMALL LIGATURE FI"
+  },
+  {
+    "𝌆": "surrogate pair—U+1D306 TETRAGRAM FOR CENTRE"
+  }
+]
Files - and ./test/tokens/6.string/2.no-combining-escapes/expected.json differ
--- rDKPEBnY.test-tokens.sh/output.pretty.json  2022-07-26 13:16:28.509287380 -0500
+++ rDKPEBnY.test-tokens.sh/expected.pretty.json    2022-07-26 13:16:28.509287380 -0500
@@ -1 +1,34 @@
 [
+  "̇ U+0307 COMBINING DOT ABOVE",
+  "᠋ U+180B MONGOLIAN FREE VARIATION SELECTOR ONE",
+  "᠌ U+180C MONGOLIAN FREE VARIATION SELECTOR TWO",
+  "᠍ U+180D MONGOLIAN FREE VARIATION SELECTOR THREE",
+  "︀ U+FE00 VARIATION SELECTOR-1",
+  "️ U+FE0F VARIATION SELECTOR-16",
+  "󠄀 U+E0100 VARIATION SELECTOR-17",
+  "󠇯 U+E01EF VARIATION SELECTOR-256",
+  {
+    "̇": "U+0307 COMBINING DOT ABOVE"
+  },
+  {
+    "᠋": "U+180B MONGOLIAN FREE VARIATION SELECTOR ONE"
+  },
+  {
+    "᠌": "U+180C MONGOLIAN FREE VARIATION SELECTOR TWO"
+  },
+  {
+    "᠍": "U+180D MONGOLIAN FREE VARIATION SELECTOR THREE"
+  },
+  {
+    "︀": "U+FE00 VARIATION SELECTOR-1"
+  },
+  {
+    "️": "U+FE0F VARIATION SELECTOR-16"
+  },
+  {
+    "󠄀": "U+E0100 VARIATION SELECTOR-17"
+  },
+  {
+    "󠇯": "U+E01EF VARIATION SELECTOR-256"
+  }
+]
tokens/6.string/3.short-escapes OK
Files - and ./test/tokens/6.string/4.other-control-escapes/expected.json differ
Files - and ./test/tokens/6.string/5.lone-surrogate-escapes/expected.json differ
--- rDKPEBnY.test-tokens.sh/output.pretty.json  2022-07-26 13:16:28.517287486 -0500
+++ rDKPEBnY.test-tokens.sh/expected.pretty.json    2022-07-26 13:16:28.517287486 -0500
@@ -1 +1 @@
-
+[
whitespace/array OK
whitespace/false OK
whitespace/null OK
whitespace/number OK
whitespace/object OK
whitespace/string OK
whitespace/true OK
FAIL: ../canonicaljson-rs/demo/target/debug/demo
alexforster commented 1 year ago

This issue description sounded concerning, so I dug in a bit and I don't actually think the situation is too bad.

Integer/float handling can be fixed by enabling the arbitrary_precision and float_roundtrip features in the serde_json crate. Thanks to feature unification, users can enable this themselves by specifying serde_json as a dependency, like so–

serde_json = { version = "1", features = ["arbitrary_precision", "float_roundtrip"] }

The Unicode handling test failures would need to be fixed in serde_json, which takes the somewhat opinionated stance that codepoints which decode to semantically ambiguous/undefined Unicode should be rejected. See serde-rs/json#495 & related issues.

leplatrem commented 1 year ago

Integer/float handling can be fixed by enabling the arbitrary_precision and float_roundtrip features in the serde_json crate.

I'd vote to enable it by default then 👌