Closed nojvek closed 4 years ago
I think that @croteaucarine is getting > 1 GB/s speeds but she is using a different approach.
Hi @nojvek, it is possible that there is still a lot of overhead. lazyParse
and isValid
should take around the same time, because they basically do the same work. There is already an issue open for that https://github.com/luizperes/simdjson_nodejs/issues/28 . I will to take a look at it carefully, also I am watching @croteaucarine as she working on https://github.com/croteaucarine/simdjson_node_objectwrap as her masters thesis.
@lemire I see, I will take a closer look at her repo next week (I will probably have a little more time).
CC @nojvek
Thanks for your work, @nojvek!
It seems std::move is having a significant perf cost. I wonder if there is a way to work around it.
Napi::External<dom::document> buffer = Napi::External<dom::document>::New(env, &parser.doc,
[](Napi::Env /*env*/, dom::document * doc) {
// delete doc;
});
I tried this and get > 1GB/s (almost double what I got before). I know ^ is not really valid code as it will give SIGSERV errors, but just wanted to try something surgical.
filename | filesize (MB) | JSON.parse (GB/s) | simdjson.lazyParse (GB/s) | X faster |
---|---|---|---|---|
apache_builds.json | 0.13 | 0.38 | 1.53 | 4.00 |
canada.json | 2.25 | 0.16 | 0.57 | 3.44 |
citm_catalog.json | 1.73 | 0.51 | 0.32 | 0.62 |
github_events.json | 0.07 | 0.44 | 1.21 | 2.79 |
gsoc_2018.json | 3.33 | 0.72 | 1.20 | 1.66 |
instruments.json | 0.22 | 0.48 | 1.40 | 2.95 |
marine_ik.json | 2.98 | 0.24 | 0.54 | 2.25 |
mesh_pretty.json | 1.58 | 0.40 | 0.92 | 2.31 |
mesh.json | 0.72 | 0.25 | 0.75 | 3.01 |
numbers.json | 0.15 | 0.25 | 0.83 | 3.25 |
random.json | 0.51 | 0.28 | 0.30 | 1.09 |
twitter.json | 0.63 | 0.41 | 0.34 | 0.85 |
twitterescaped.json | 0.56 | 0.31 | 1.06 | 3.40 |
update_center.json | 0.53 | 0.28 | 0.32 | 1.14 |
I see, that makes a lot of sense! When you do it, is it the same performance of the isValid
?
filename | filesize (MB) | lazyParse (GB/s) | isValid (GB/s) |
---|---|---|---|
apache_builds.json | 0.13 | 0.69 | 1.36 |
canada.json | 2.25 | 0.39 | 0.39 |
citm_catalog.json | 1.73 | 0.27 | 0.28 |
github_events.json | 0.07 | 0.73 | 1.14 |
gsoc_2018.json | 3.33 | 0.89 | 1.03 |
instruments.json | 0.22 | 0.66 | 1.27 |
marine_ik.json | 2.98 | 0.41 | 0.45 |
mesh_pretty.json | 1.58 | 0.76 | 0.92 |
mesh.json | 0.72 | 0.51 | 0.66 |
numbers.json | 0.15 | 0.56 | 0.79 |
random.json | 0.51 | 0.26 | 0.30 |
twitter.json | 0.63 | 0.31 | 0.34 |
twitterescaped.json | 0.56 | 0.73 | 0.83 |
update_center.json | 0.53 | 0.28 | 0.32 |
Seems like the overhead adds up for small files more than bigger files. I assume that's because benchmark runs more ops for smaller files than for bigger files.
Not only that, it seems to be the cost of doing new dom::document(std::move(parser.doc)
here. You see that the values you get with isValid
are very similar to the values that you get with lazyParse
without the std::move
. Very interesting indeed.
Ping @jkeiser : you may want to check these numbers and the discussion on the cost of std::move.
I'm wondering if it's the new rather than the move? There should only be 2 pointers to move here, I would have expected it literally to be 1-2 load/stores.
We could try a few things:
static document doc
, assign the result of the parse directly into that, and just take &doc?@lemire all of this makes a reasonable case for the document to be rooted in a single pointer. Bindings are far more likely to naively support storing one raw pointer than two.
By creating an empty document
Napi::External<dom::document> buffer = Napi::External<dom::document>::New(env, new dom::document(),
[](Napi::Env /*env*/, dom::document * doc) {
delete doc;
});
I get: | filename | filesize (MB) | JSON.parse(ms) | simdjson.lazyParse (ms) | JSON.parse (GB/s) | simdjson.lazyParse (GB/s) | X faster |
---|---|---|---|---|---|---|---|
apache_builds.json | 0.13 | 0.698 | 0.116 | 0.18 | 1.09 | 6.00 | |
canada.json | 2.25 | 27.311 | 8.774 | 0.08 | 0.26 | 3.11 | |
citm_catalog.json | 1.73 | 20.584 | 8.044 | 0.08 | 0.21 | 2.56 | |
github_events.json | 0.07 | 0.724 | 0.276 | 0.09 | 0.24 | 2.62 | |
gsoc_2018.json | 3.33 | 18.987 | 3.416 | 0.18 | 0.97 | 5.56 | |
instruments.json | 0.22 | 0.963 | 0.212 | 0.23 | 1.04 | 4.54 | |
marine_ik.json | 2.98 | 24.261 | 7.476 | 0.12 | 0.40 | 3.25 | |
mesh_pretty.json | 1.58 | 7.439 | 2.517 | 0.21 | 0.63 | 2.96 | |
mesh.json | 0.72 | 5.941 | 1.286 | 0.12 | 0.56 | 4.62 | |
numbers.json | 0.15 | 1.124 | 0.257 | 0.13 | 0.58 | 4.37 | |
random.json | 0.51 | 9.979 | 2.308 | 0.05 | 0.22 | 4.32 | |
sf_citylots.json | 189.78 | 2319.485 | 903.317 | 0.08 | 0.21 | 2.57 | |
twitter.json | 0.63 | 9.038 | 2.687 | 0.07 | 0.24 | 3.36 | |
twitterescaped.json | 0.56 | 3.256 | 0.740 | 0.17 | 0.76 | 4.40 | |
update_center.json | 0.53 | 7.899 | 2.254 | 0.07 | 0.24 | 3.50 |
So, unless I got something wrong, it is not the new
operator?
I took a look at this problem today again and is really the std::move
operator (in my case). I also checked @crouteaucarine 's repo and she is using it as well. I am not so sure how she's achieving such high perfomance, the only difference in our repos is that she uses a C++ a class to deal with the problem, while a don't. She is using a different model for her benchmark as well (will take another look at it later).
@luizperes I made a lot of research and tests to find out that the conversion between C++ Objects and Napi Objects really takes a lot of time and must be avoided as much as possible. Event passing the JSON doc as string and converting it to Napi::String before calling the simdjson parser has a considerable impact on performances. In my repo, I only parse the document with simdjson parser in the constructor and keep for a later use (no lazyParse). This is probaly why my benchmarks are more performants. Constructor can also take path or document depending on situations. I have also overrided mosts used JS methods (toString, length, keys, iterator, etc) on the C++ side and try to make Napi conversions as less as possible. Once a method is called, the result is stored for other uses so other calls sould be faster. What my benchmarks don't show is that some of theese methods are a little bit slower than native ones. Theese results will be presented in my thesis in a couple of weeks.
@luizperes maybe the right thing to do (at least for now) is to return to the old way of using new parser
to construct the parser, and then just attaching a pointer to the parser instead of the document ... and we can try to make this scenario more performant with changes in 0.4.
I made a PR for that: #36 . I'm not sure it really helps, some numbers are up and some are down, which makes me suspect we just have a wide error margin for measurement.
Also filed #35 for a way to amortize a lot of the allocation cost and use simdjson a little bit more idiomatically.
Thanks for your help @jkeiser! Yeah, as I mentioned on #36, it seems that this approach did the opposite (it seems a little more costly). As for the error margin, I don't think I understand what you mean that "some numbers are up and some are down", could you explain it a little better?
As for the error margin, I don't think I understand what you mean that "some numbers are up and some are down", could you explain it a little better?
In my results there, the PR actually makes some files faster, and others slower. I was blaming error margin and jitter in the results because I couldn't explain it--everything in the change should be fixed per-document overhead-- but after running it a few more times, it might be real.
Fixes #21
@lemire / @luizperes I am not seeing the > 1GB/s results though. Not sure what the nodejs overhead is, or whether my macbook pro from 2018 is just too slow.