luizperes / simdjson_nodejs

Node.js bindings for the simdjson project: "Parsing gigabytes of JSON per second"
https://arxiv.org/abs/1902.08318
Apache License 2.0
549 stars 25 forks source link

Benchmark with GB/s columns #33

Closed nojvek closed 4 years ago

nojvek commented 4 years ago

Fixes #21

@lemire / @luizperes I am not seeing the > 1GB/s results though. Not sure what the nodejs overhead is, or whether my macbook pro from 2018 is just too slow.

lemire commented 4 years ago

I think that @croteaucarine is getting > 1 GB/s speeds but she is using a different approach.

https://github.com/croteaucarine/simdjson_node_objectwrap

luizperes commented 4 years ago

Hi @nojvek, it is possible that there is still a lot of overhead. lazyParse and isValid should take around the same time, because they basically do the same work. There is already an issue open for that https://github.com/luizperes/simdjson_nodejs/issues/28 . I will to take a look at it carefully, also I am watching @croteaucarine as she working on https://github.com/croteaucarine/simdjson_node_objectwrap as her masters thesis.

luizperes commented 4 years ago

@lemire I see, I will take a closer look at her repo next week (I will probably have a little more time).

CC @nojvek

luizperes commented 4 years ago

Thanks for your work, @nojvek!

nojvek commented 4 years ago

It seems std::move is having a significant perf cost. I wonder if there is a way to work around it.

  Napi::External<dom::document> buffer = Napi::External<dom::document>::New(env, &parser.doc,
    [](Napi::Env /*env*/, dom::document * doc) {
      // delete doc;
    });

I tried this and get > 1GB/s (almost double what I got before). I know ^ is not really valid code as it will give SIGSERV errors, but just wanted to try something surgical.

filename filesize (MB) JSON.parse (GB/s) simdjson.lazyParse (GB/s) X faster
apache_builds.json 0.13 0.38 1.53 4.00
canada.json 2.25 0.16 0.57 3.44
citm_catalog.json 1.73 0.51 0.32 0.62
github_events.json 0.07 0.44 1.21 2.79
gsoc_2018.json 3.33 0.72 1.20 1.66
instruments.json 0.22 0.48 1.40 2.95
marine_ik.json 2.98 0.24 0.54 2.25
mesh_pretty.json 1.58 0.40 0.92 2.31
mesh.json 0.72 0.25 0.75 3.01
numbers.json 0.15 0.25 0.83 3.25
random.json 0.51 0.28 0.30 1.09
twitter.json 0.63 0.41 0.34 0.85
twitterescaped.json 0.56 0.31 1.06 3.40
update_center.json 0.53 0.28 0.32 1.14
luizperes commented 4 years ago

I see, that makes a lot of sense! When you do it, is it the same performance of the isValid?

nojvek commented 4 years ago
filename filesize (MB) lazyParse (GB/s) isValid (GB/s)
apache_builds.json 0.13 0.69 1.36
canada.json 2.25 0.39 0.39
citm_catalog.json 1.73 0.27 0.28
github_events.json 0.07 0.73 1.14
gsoc_2018.json 3.33 0.89 1.03
instruments.json 0.22 0.66 1.27
marine_ik.json 2.98 0.41 0.45
mesh_pretty.json 1.58 0.76 0.92
mesh.json 0.72 0.51 0.66
numbers.json 0.15 0.56 0.79
random.json 0.51 0.26 0.30
twitter.json 0.63 0.31 0.34
twitterescaped.json 0.56 0.73 0.83
update_center.json 0.53 0.28 0.32

Seems like the overhead adds up for small files more than bigger files. I assume that's because benchmark runs more ops for smaller files than for bigger files.

luizperes commented 4 years ago

Not only that, it seems to be the cost of doing new dom::document(std::move(parser.doc) here. You see that the values you get with isValid are very similar to the values that you get with lazyParse without the std::move. Very interesting indeed.

lemire commented 4 years ago

Ping @jkeiser : you may want to check these numbers and the discussion on the cost of std::move.

jkeiser commented 4 years ago

I'm wondering if it's the new rather than the move? There should only be 2 pointers to move here, I would have expected it literally to be 1-2 load/stores.

We could try a few things:

  1. Is there a way to create a 2-pointer-size value? I don't think I saw one.
  2. Can you try to do this without the new? Like, make sure the benchmark doesn't try to keep two documents around, make a global static document doc, assign the result of the parse directly into that, and just take &doc?
  3. Provide an interface that lets the user manage a parser object, and use memory from that--don't make any other js objects besides the parser.

@lemire all of this makes a reasonable case for the document to be rooted in a single pointer. Bindings are far more likely to naively support storing one raw pointer than two.

luizperes commented 4 years ago

By creating an empty document

Napi::External<dom::document> buffer = Napi::External<dom::document>::New(env, new dom::document(),
    [](Napi::Env /*env*/, dom::document * doc) {
      delete doc;
    });
I get: filename filesize (MB) JSON.parse(ms) simdjson.lazyParse (ms) JSON.parse (GB/s) simdjson.lazyParse (GB/s) X faster
apache_builds.json 0.13 0.698 0.116 0.18 1.09 6.00
canada.json 2.25 27.311 8.774 0.08 0.26 3.11
citm_catalog.json 1.73 20.584 8.044 0.08 0.21 2.56
github_events.json 0.07 0.724 0.276 0.09 0.24 2.62
gsoc_2018.json 3.33 18.987 3.416 0.18 0.97 5.56
instruments.json 0.22 0.963 0.212 0.23 1.04 4.54
marine_ik.json 2.98 24.261 7.476 0.12 0.40 3.25
mesh_pretty.json 1.58 7.439 2.517 0.21 0.63 2.96
mesh.json 0.72 5.941 1.286 0.12 0.56 4.62
numbers.json 0.15 1.124 0.257 0.13 0.58 4.37
random.json 0.51 9.979 2.308 0.05 0.22 4.32
sf_citylots.json 189.78 2319.485 903.317 0.08 0.21 2.57
twitter.json 0.63 9.038 2.687 0.07 0.24 3.36
twitterescaped.json 0.56 3.256 0.740 0.17 0.76 4.40
update_center.json 0.53 7.899 2.254 0.07 0.24 3.50

So, unless I got something wrong, it is not the new operator?

luizperes commented 4 years ago

I took a look at this problem today again and is really the std::move operator (in my case). I also checked @crouteaucarine 's repo and she is using it as well. I am not so sure how she's achieving such high perfomance, the only difference in our repos is that she uses a C++ a class to deal with the problem, while a don't. She is using a different model for her benchmark as well (will take another look at it later).

croteaucarine commented 4 years ago

@luizperes I made a lot of research and tests to find out that the conversion between C++ Objects and Napi Objects really takes a lot of time and must be avoided as much as possible. Event passing the JSON doc as string and converting it to Napi::String before calling the simdjson parser has a considerable impact on performances. In my repo, I only parse the document with simdjson parser in the constructor and keep for a later use (no lazyParse). This is probaly why my benchmarks are more performants. Constructor can also take path or document depending on situations. I have also overrided mosts used JS methods (toString, length, keys, iterator, etc) on the C++ side and try to make Napi conversions as less as possible. Once a method is called, the result is stored for other uses so other calls sould be faster. What my benchmarks don't show is that some of theese methods are a little bit slower than native ones. Theese results will be presented in my thesis in a couple of weeks.

jkeiser commented 4 years ago

@luizperes maybe the right thing to do (at least for now) is to return to the old way of using new parser to construct the parser, and then just attaching a pointer to the parser instead of the document ... and we can try to make this scenario more performant with changes in 0.4.

jkeiser commented 4 years ago

I made a PR for that: #36 . I'm not sure it really helps, some numbers are up and some are down, which makes me suspect we just have a wide error margin for measurement.

Also filed #35 for a way to amortize a lot of the allocation cost and use simdjson a little bit more idiomatically.

luizperes commented 4 years ago

Thanks for your help @jkeiser! Yeah, as I mentioned on #36, it seems that this approach did the opposite (it seems a little more costly). As for the error margin, I don't think I understand what you mean that "some numbers are up and some are down", could you explain it a little better?

35 seems interesting :)

jkeiser commented 4 years ago

As for the error margin, I don't think I understand what you mean that "some numbers are up and some are down", could you explain it a little better?

In my results there, the PR actually makes some files faster, and others slower. I was blaming error margin and jitter in the results because I couldn't explain it--everything in the change should be fixed per-document overhead-- but after running it a few more times, it might be real.