staltz / bipf-native

MIT License
5 stars 1 forks source link

Performance studies #8

Open staltz opened 2 years ago

staltz commented 2 years ago

As of commit https://github.com/staltz/bipf-native/commit/bb5b98f9ecc163a4edcad52599201048a7cb705a built with ReleaseSmall:

operation ops/ms
bipf.encode 1.8511662347278786
bipf-native.encode 0.13290096220296635
bipf.decode 7.326007326007326
bipf-native.decode 2.0977554017201596
JSON.stringify 3.6724201248622843
JSON.parse 5.1894135962636225
JSON.parse(buffer) 5.574136008918618
JSON.stringify(JSON.parse()) 3.134796238244514
bipf.seek(string) 1250
bipf-native.seek(string) 98.03921568627452

cc @jerive

staltz commented 2 years ago

I ran Valgrind on bipf-native running perf.js on the encode part only, and then I ran Valgrind on bipf running the same perf.js, here's what I found (screenshot from KCachegrind, sorted by Called):

bipf-native

Screenshot from 2021-11-26 18-14-57@2x

bipf

Screenshot from 2021-11-26 18-16-05@2x

Notes

What I think is going on is that napi probably does a v8 TryCatch, and try-catches are known to be bad for performance if you would write them by hand in JS. The other alternative is implementing bipf-native with v8 APIs directly, maybe that way we can take shortcuts.

staltz commented 2 years ago

@jerive I think we should try implementing encodingLength and encode using V8 APIs directly and check how the performance looks like. We have no use for bipf-native if it's slow.

jerive commented 2 years ago

Sure it's what I was afraid of. I'm currently trying to include the V8 headers...

jerive commented 2 years ago

I will certainly need support here zig does not find some things that are part of cpp stdlib.

// v8.h
#include <atomic>
#include <memory>
#include <string>
#include <type_traits>
#include <utility>
#include <vector>

it should be supported (https://github.com/ziglang/zig/issues/4786) but no clue how. I will leave it here for tonight.

staltz commented 2 years ago

I will give it a try too, tomorrow

jerive commented 2 years ago

On ubuntu, the header files are in the package libc++-11-dev. Now how to link them.

staltz commented 2 years ago

Can you try -lc++ ?

staltz commented 2 years ago

See this issue in Node.js https://github.com/nodejs/node/issues/14379#issuecomment-317124096

Seems like the performance drawbacks of N-API were acknowledged, but "compared the results to the original module that used V8 APIs ... overhead of N-API is fairly minimal already". I'm concerned what does that "minimal overhead" mean. If converting bipf-native to V8 APIs would only speed up by 2x compared to N-API bipf-native, that's not good enough. We would need it to be 20x to make bipf-native V8 at least 2x faster than bipf.

dominictarr commented 2 years ago

I'll bet you it's the overhead of the js-native api. especially since you are doing many tiny calls. The thing that originally lead me to creating bipf was realizing that grep could process the ssb log in less than a second, but json parse took several minutes.

I would be interested to see how fast it went if you ran the benchmark inside zig, you wouldn't be able to do encode but it would do seek, etc. I'd also be curious how it went with wasm. However, you'd have to copy each record into wasm to be able to run bipf on it. What would be faster is better is to copy the whole block into wasm memory and then scan many bipf records without leaving wasm. However, in practical terms that means reimplementing flume/jitdb in wasm.

staltz commented 2 years ago

Yes, @dominictarr, it smells like a bridging overhead indeed. What I'm wondering is why is there a copy in the first place. There's no FFI. How can JSON.stringify/parse operate without copying, and if we have access to V8 APIs, couldn't we get as close as JSON's status?

staltz commented 2 years ago

Yeah, so we tried to implement BIPF in C++ first using N-API and then V8 APIs, and it's a bit hopeless: https://github.com/jerive/bipf-napi/issues/1

TL;DR is that the problem isn't that the bridge between C++ and JS is slow, the problem is that pure JS code is usually optimized just in time to machine code and that's highly efficient. In our benchmarks in the issue linked, the bottleneck was literally just V8::Number::New(). Bottomline is that JS is fast, and the way of being competitive is by replacing large or all parts of the program with C++ or other low-level programming languages. Replacing small parts doesn't seem to work.