luizperes / simdjson_nodejs

Node.js bindings for the simdjson project: "Parsing gigabytes of JSON per second"
https://arxiv.org/abs/1902.08318
Apache License 2.0
554 stars 25 forks source link

Test: Add a test for a very big stringified JSON #12

Closed hrdwdmrbl closed 5 years ago

hrdwdmrbl commented 5 years ago

Change: Use parsing for the benchmark

I'm working on a project which needs to parse some very large stringified JSON (up to 2GB). So I made some tests to try out simdjson but I found that it's actually slower than the native JSON.parse. Can you take a look at my benchmark to see if it makes sense?

hrdwdmrbl commented 5 years ago
apache_builds.json#simdjson x 484 ops/sec ±0.49% (89 runs sampled) => 0.0020676151989801215
apache_builds.json#JSON x 1,559 ops/sec ±1.72% (86 runs sampled) => 0.0006414860871607466
canada.json#simdjson x 17.50 ops/sec ±1.53% (51 runs sampled) => 0.057130428921568624
canada.json#JSON x 43.48 ops/sec ±2.53% (57 runs sampled) => 0.0230009568245614
citm_catalog.json#simdjson x 43.50 ops/sec ±1.72% (57 runs sampled) => 0.022988972415204683
citm_catalog.json#JSON x 92.08 ops/sec ±2.10% (68 runs sampled) => 0.010860175213235297
github_events.json#simdjson x 752 ops/sec ±1.14% (88 runs sampled) => 0.001330371967777624
github_events.json#JSON x 1,827 ops/sec ±0.88% (91 runs sampled) => 0.0005473416009393209
gsoc-2018.json#simdjson x 19.88 ops/sec ±1.38% (37 runs sampled) => 0.05029835266216216
gsoc-2018.json#JSON x 55.62 ops/sec ±4.43% (58 runs sampled) => 0.017979938466954025
instruments.json#simdjson x 270 ops/sec ±1.67% (81 runs sampled) => 0.003707396584310699
instruments.json#JSON x 1,032 ops/sec ±2.81% (86 runs sampled) => 0.0009692163768379276
marine-ik.json#simdjson x 14.23 ops/sec ±1.59% (39 runs sampled) => 0.07028351315384616
marine-ik.json#JSON x 47.90 ops/sec ±2.52% (63 runs sampled) => 0.020875456063492063
mesh.json#simdjson x 76.72 ops/sec ±1.14% (65 runs sampled) => 0.013034520894615384
mesh.json#JSON x 201 ops/sec ±5.58% (72 runs sampled) => 0.004967135384615383
mesh.pretty.json#simdjson x 67.15 ops/sec ±1.94% (69 runs sampled) => 0.014892885637681158
mesh.pretty.json#JSON x 148 ops/sec ±2.76% (75 runs sampled) => 0.006736579354074073
numbers.json#simdjson x 658 ops/sec ±0.67% (88 runs sampled) => 0.0015190600334645517
numbers.json#JSON x 1,042 ops/sec ±1.79% (86 runs sampled) => 0.0009600678843511433
random.json#simdjson x 65.56 ops/sec ±0.86% (68 runs sampled) => 0.015253578511029415
random.json#JSON x 133 ops/sec ±3.43% (68 runs sampled) => 0.0075018205802696095
twitter.json#simdjson x 74.08 ops/sec ±1.09% (73 runs sampled) => 0.013498978531506847
twitter.json#JSON x 167 ops/sec ±3.31% (76 runs sampled) => 0.005999256688742691
twitterescaped.json#simdjson x 85.41 ops/sec ±2.02% (73 runs sampled) => 0.011708792219178076
twitterescaped.json#JSON x 330 ops/sec ±3.22% (76 runs sampled) => 0.003028334761034989
update-center.json#simdjson x 60.87 ops/sec ±2.34% (63 runs sampled) => 0.016429358626984126
update-center.json#JSON x 140 ops/sec ±2.59% (70 runs sampled) => 0.007127920557936508
big_nonsense.json#simdjson x 0.20 ops/sec ±2.63% (5 runs sampled) => 4.9135809164
big_nonsense.json#JSON x 0.79 ops/sec ±4.04% (6 runs sampled) => 1.2660380240000002
|      filename     |   JSON file    |   simdjson file |
| :---------------: | :------------: | :-------------: |
| apache_builds.json.json | 0.0006414860871607466 | 0.0020676151989801215 |
| canada.json.json | 0.0230009568245614 | 0.057130428921568624 |
| citm_catalog.json.json | 0.010860175213235297 | 0.022988972415204683 |
| github_events.json.json | 0.0005473416009393209 | 0.001330371967777624 |
| gsoc-2018.json.json | 0.017979938466954025 | 0.05029835266216216 |
| instruments.json.json | 0.0009692163768379276 | 0.003707396584310699 |
| marine-ik.json.json | 0.020875456063492063 | 0.07028351315384616 |
| mesh.json.json | 0.004967135384615383 | 0.013034520894615384 |
| mesh.pretty.json.json | 0.006736579354074073 | 0.014892885637681158 |
| numbers.json.json | 0.0009600678843511433 | 0.0015190600334645517 |
| random.json.json | 0.0075018205802696095 | 0.015253578511029415 |
| twitter.json.json | 0.005999256688742691 | 0.013498978531506847 |
| twitterescaped.json.json | 0.003028334761034989 | 0.011708792219178076 |
| update-center.json.json | 0.007127920557936508 | 0.016429358626984126 |
| big_nonsense.json.json | 1.2660380240000002 | 4.9135809164 |
luizperes commented 5 years ago

Hi @hrdwdmrbl, I have been busy this past month and a half and wasn't able to fix it properly. We are aware of that, please take a look at the thread https://github.com/luizperes/simdjson_nodejs/issues/5

What do you exactly need as of now? Could you please open an issue so that I am able to address it and make it work for your case? Cheers!

hrdwdmrbl commented 5 years ago

@luizperes Created over here https://github.com/luizperes/simdjson_nodejs/issues/13

hrdwdmrbl commented 5 years ago

@luizperes I'm also willing to contribute if you need help. Just point me in the right direction. This is important for my work so I can spend some time on it

luizperes commented 5 years ago

Hi @hrdwdmrbl, I started working on this new feature today, please watch the branch https://github.com/luizperes/simdjson_nodejs/tree/fix-5. I am creating a new parse method inside as a wrapper called parseFast as described on #5

That sounds great, I will probably need someone to test the code in production. I am back to it and will contact you soon, cheers!

hrdwdmrbl commented 5 years ago

Why did you abandon work on the improve-performance branch solution? Or, I think you incorporated that work back in to fix-5?

luizperes commented 5 years ago

Hi @hrdwdmrbl, the improve-performance was actually the fix-5 and I took a long time to find the right branch, since I was working with that last month, so I created a new branch. Once I found the new branch, I merged them together, but I think that it makes more sense to be working on the fix-5. Btw I have most of it done, however I can only finish the rest tomorrow night (around the same time of now, Pacific time). Here is the code that is still on TODO: https://github.com/luizperes/simdjson_nodejs/blob/fix-5/simdjson/bindings.cpp#L120

If you feel confident, you're welcome to work on it as well.

My idea is to do something like:

let json = simdjson.lazyParse("{\"luiz\": 2}");
console.log(json.valueForKeyPath("luiz")); // outputs 2

the valueForKeyPath is a known method on Obj-C and Swift, Examples here

It could also be in the form object.items[2].name and etc

luizperes commented 5 years ago

Hi @hrdwdmrbl, I am rejecting (closing) your PR as I have fixed #5 on PR #14 , the lazyParse works now! Cheers