jakartaee / jsonb-api

Jakarta JSON Binding
https://eclipse-ee4j.github.io/jsonb-api/
Other
78 stars 39 forks source link

Consider adopting the Ryu algorithm #130

Open cyberphone opened 5 years ago

cyberphone commented 5 years ago

The Ryu algorithm (https://github.com/ulfjack/ryu) for IEEE-754 serialization offer a number of interesting features such a simple, fast and (with a minor tweak) 100% compatible with ES6. Using the Ryu algorithm also makes https://tools.ietf.org/html/draft-rundgren-json-canonicalization-scheme-06 more realistic for JSONB, particularly with the proposed upgrade in https://github.com/eclipse-ee4j/jsonp/issues/160. Here is a comparison made between String.valueOf(double) and my Ryu adaption for Java where each value has been serialized 1M times.

IEEE-754           JDK   Ryu  JDK Serialization         Ryu Serialization
Selected values:
0000000000000000    62    34  0.0                       0
8000000000000000    31    16  -0.0                      0
0000000000000001   511    80  4.9E-324                  5e-324
8000000000000001   522    64  -4.9E-324                 -5e-324
7fefffffffffffff  1764   142  1.7976931348623157E308    1.7976931348623157e+308
ffefffffffffffff  1745   110  -1.7976931348623157E308   -1.7976931348623157e+308
4340000000000000    96   157  9.007199254740992E15      9007199254740992
c340000000000000    80    79  -9.007199254740992E15     -9007199254740992
4430000000000000   397   158  2.9514790517935283E20     295147905179352830000
44b52d02c7e14af5   349   176  9.999999999999997E22      9.999999999999997e+22
44b52d02c7e14af6   349   207  9.999999999999999E22      1e+23
44b52d02c7e14af7   350    96  1.0000000000000001E23     1.0000000000000001e+23
444b1ae4d6e2ef4e   302   159  9.999999999999997E20      999999999999999700000
444b1ae4d6e2ef4f   304    95  9.999999999999999E20      999999999999999900000
444b1ae4d6e2ef50   126   287  1.0E21                    1e+21
3eb0c6f7a0b5ed8c  1236   175  9.999999999999997E-7      9.999999999999997e-7
3eb0c6f7a0b5ed8d   333   206  1.0E-6                    0.000001
41b3de4355555553   349   191  3.333333333333332E8       333333333.3333332
41b3de4355555554   326   111  3.3333333333333325E8      333333333.33333325
41b3de4355555555   304   111  3.333333333333333E8       333333333.3333333
41b3de4355555556   300   111  3.333333333333334E8       333333333.3333334
41b3de4355555557   318   113  3.3333333333333343E8      333333333.33333343
becbf647612f3696   980   128  -3.3333333333333333E-6    -0.0000033333333333333333
IEEE-754           JDK   Ryu  JDK Serialization         Ryu Serialization
Random values:
34ff465fb5a29fd8  1077    95  2.0407823107657903E-53    2.0407823107657903e-53
0180aa7cc33fd23b  1846   112  1.9442174934288537E-301   1.9442174934288537e-301
2ceb54308f24234d  1140    95  2.620311665382176E-92     2.620311665382176e-92
b287eee46fdff09f  1035   115  -2.840738340340803E-65    -2.840738340340803e-65
2229562c860af2e6  1318   110  4.0580809869721453E-144   4.0580809869721453e-144
f80f61239d20259a  1710    96  -2.0721989565386525E270   -2.0721989565386525e+270
add9ab29d8081dba  1095    95  -8.06461389117999E-88     -8.06461389117999e-88
d14fba0dc766156e  1178    97  -4.8152042555373337E83    -4.8152042555373337e+83
6ab1e7ab55b21b95  1237   111  8.98194436781924E205      8.98194436781924e+205
5e25856f84be4190  1111    95  3.35919387840897E145      3.35919387840897e+145
66eeda83d5ec1ec3  1243    96  6.712322230840099E187     6.712322230840099e+187
1d037151a2eee09d  1284    97  6.439734202044909E-169    6.439734202044909e-169
8d4fb7c3d3902f2e  1681    95  -1.4516336465501977E-244  -1.4516336465501977e-244
2f23cd7caecbc18f  1207    95  1.3047738249326575E-81    1.3047738249326575e-81
2dde8dcdf93762ab  1144    95  9.599492442524916E-88     9.599492442524916e-88
84d3ff3c4b08ae2b  1735    96  -2.1012090586052427E-285  -2.1012090586052427e-285
54e868c970dcc690  1171    95  1.0677862209985186E101    1.0677862209985186e+101
d01ced65deab00bd   968    95  -8.37389158027637E77      -8.37389158027637e+77
37c9c83893ccce89  1000    95  5.919282917670671E-40     5.919282917670671e-40
8eb7a89ab8126107  1381    95  -9.08307027853398E-238    -9.08307027853398e-238
c25a4aee0c1c5cab   332   111  -4.5170505739344794E11    -451705057393.44794
056dd3cb849bdc76  1747   111  1.6046797594959664E-282   1.6046797594959664e-282
770cb51be8ffb6c9  1686    95  2.8926835254693333E265    2.8926835254693333e+265
0af991011d61cd97  1540   111  8.513608375142944E-256    8.513608375142944e-256
fd86b4cb740afd06  1651    96  -4.6405640207979055E296   -4.6405640207979055e+296
6770f6560ace46c8  1284   111  1.889386235130226E190     1.889386235130226e+190
29c6087173eaf5ab  1144    96  1.876310978051856E-107    1.876310978051856e-107
03854f4599dcf432  1765    96  1.0677034467937273E-291   1.0677034467937273e-291
1f1eebdbc2be423a  1302   111  8.797521803925928E-159    8.797521803925928e-159
589eae2afa44a8f6  1125    95  7.736749288152502E118     7.736749288152502e+118
c0b9ef54d02451c3   350   111  -6639.331300992929        -6639.331300992929
7c6092051b900ba6  1715    96  1.2918692647856208E291    1.2918692647856208e+291
92f59d018298a475  1535    95  -2.4490875310340735E-217  -2.4490875310340735e-217
b7aca47913b015f6  1001   111  -1.643997295019911E-40    -1.643997295019911e-40
d872908c76063477  1191   111  -1.1703747023547008E118   -1.1703747023547008e+118
1704d41ceb0a5bb3  1424   111  8.707474946109467E-198    8.707474946109467e-198
48d6e50167e817f8   950    95  7.977587285062442E42      7.977587285062442e+42
887e4ef213b04cf3  1538    95  -9.179237569720412E-268   -9.179237569720412e-268
98c01cf9a909b084  1379   111  -1.808231857146977E-189   -1.808231857146977e-189
6e8b2131628be78d  1537    95  3.1381314757793522E224    3.1381314757793522e+224
e828e9002b49fb99  1505    95  -5.6825560219219796E193   -5.6825560219219796e+193
96f87c499d218bcb  1351    96  -5.11813583331773E-198    -5.11813583331773e-198
70fa702a8bead87a  1566   111  1.6812320757231358E236    1.6812320757231358e+236
0edbea290c4de85e  1617    95  4.2868295597931197E-237   4.2868295597931197e-237
c0d6ff0aba2e5742   302   110  -23548.167613587582       -23548.167613587582
a709f0b9af623191  1283   111  -1.2557040640113272E-120  -1.2557040640113272e-120
d1a640cc4b91653b  1141    95  -2.1615219364130858E85    -2.1615219364130858e+85
3d54e9aec27017c1   937    95  2.9718908747180134E-13    2.9718908747180134e-13
4249669384146a70   319   111  2.1819025207283154E11     218190252072.83154
1d7b128021ba310c  1413   111  1.1477493239359762E-166   1.1477493239359762e-166
JDK Total=74325 Ryu Total=8075
rmannibucau commented 5 years ago

Hi @cyberphone , isnt it an implementation optimization? Then it belongs to yasson bugtracker.

Side note: current serialization must stay since we got a 1.0 so if not an implementation detail you can do ryu adapters to achieve it.

cyberphone commented 5 years ago

@rmannibucau I don't know where this "belongs" but both Go and C# is in the process of replacing their current number serializers with Ryu.

My personal interest is more on the ES6 compatibility side than on performance.

rmannibucau commented 5 years ago

Hmm, can you point out es6 - even es5? - incompatibilities maybe? I used it quick a lot with primitives already and issues didnt pop up yet both ways.

cyberphone commented 5 years ago

ES6 compatibility is only needed for canonicalization. I just hoped to get this as a "bonus" 😀 since JCS is not a target for JSONB. The speed improvement was pretty impressive.

You'll find all links in the Internet-Draft.

bravehorsie commented 5 years ago

https://github.com/ulfjack/ryu states that the java impl output may differ from the Double#toString methods. Can that break jsonb-spec 3.3.2 section?

rmannibucau commented 5 years ago

Looks it is compatible but it is also a draft so quite bad for a jakata spec - keep in mind jsonschema which is in draft already broke features. Also nothing requires an impl to use valueOf - guess they all do but this is not required AFAIK since not part of user facing API - this is why it is an impl detail for me.

So only question is for me the json number representation and while any round trip (java/json) works - which means js can consume it - I guess we are covered at spec level.

If a new final json spec pops up it could become a toggle - config property and annotation - IMHO.

Does it make sense?

cyberphone commented 5 years ago

@rmannibucau JS (as well as any correctly implemented JSON parser), can consume both your existing and proposed format, it is only canonicalization that requires absolute ES6 compliance.

Regarding JDK roundtripping, the ES6/Ryu adaptation succeeds using my 100M test file: https://github.com/cyberphone/json-canonicalization/tree/master/testdata#es6-numbers

https://github.com/cyberphone/json-canonicalization/blob/master/java/miscellaneous/src/ES6NumberTest.java#L52

I would not bother with a toggle since the "problem" rather is in the spec.

The test program failed on C# but it turned out to be due to a bug in the .NET number parser. After reporting it, Microsoft fixed it as well!

JSON Canonicalization now works on 5 platforms.

rmannibucau commented 5 years ago

Hmm, my point is I fail to see the problem in the spec. To be clear I can see it in some implementations but not the spec.

bravehorsie commented 5 years ago

@cyberphone What version of https://github.com/ulfjack/ryu is used in your comparison table? I've grabbed java sources from their master and have different output. Scientific notation E sign is a "big E" instead of small and is not followed by a '+' sign. I run the test for "Selected values" from your table and the output differs only for one line:

IEEE-754            Double#toString         RyuDouble#doubleToString
44b52d02c7e14af6    9.999999999999999E22    1.0E23
cyberphone commented 5 years ago

@bravehorsie I took the Java version as it was 6 month ago and modified formatting slightly to make it comply with ES6 rules.

It is just one file/class: https://github.com/cyberphone/openkeystore/blob/master/library/src/org/webpki/json/NumberToJSON.java

bravehorsie commented 5 years ago

@cyberphone I have tried your Ryu extraction and it produces the values as is in the table. That is different from what current java Ryu master code produces. Why I am mentioning this is that in your NumberToJSON the output breaks the contract specified by the Double#toString(double d) javadoc, which in turn breaks the jsonb-spec 3.3.2 section.

cyberphone commented 5 years ago

@bravehorsie That's correct, my variation follows the ES6 specification which is implemented in all browsers and Node.js. They are fortunately both 100% JSON compatible so from an interoperability point of view it doesn't matter what you select unless somebody is using a dedicated non-compliant parser. I would consider a spec upgrade but that is of course for the specification committee to decide.

cyberphone commented 4 years ago

This recently published RFC https://tools.ietf.org/html/rfc8785 also builds on the Ryu/EcmaScript algorithm.

rmannibucau commented 4 years ago

@cyberphone I'm not sure we can break the spec that much - actually I hope we don't break it like that - but did you evaluate the option to impl a ryu-jsonb-de/serializer? Would kind of make everyone happy I think and avoid endless discussion about backward compatibility, a potential new flag hard to justify today and things like that. If ryu-jsonb-integration is used a lot then the spec could check back if a flag is worth it IMHO. Wdyt?

cyberphone commented 4 years ago

@rmannibucau The spec has gained traction so I must concentrate on the next step which is RFC-ing JSF: https://mobilepki.org/jsf-lab/home