meteor / meteor-feature-requests

A tracker for Meteor issues that are requests for new functionality, not bugs.
Other
89 stars 3 forks source link

Accelerate JSON parsing with simdjson to reduce CPU usage. #393

Open vlasky opened 4 years ago

vlasky commented 4 years ago

simdjson is currently the "fastest JSON parser in the world". The recently released version 0.3 claims to achieve a parsing speed of over 3GB/s.

It is written in C++ and achieves its impressive speed by automatically leveraging the CPU's SIMD instructions and using microparallel algorithms.

There is a nodejs binding available.

I expect that incorporating this library and binding into Meteor would significantly reduce CPU usage inside the event loop and improve Meteor webapp performance.

I expect that most of the changes would need to be made in Meteor's EJSON package.

vlasky commented 4 years ago

Having taken a quick look at Meteor's source code, I see that JSON.parse() is called all over the place, so I expect the performance gains could be even greater.

Given that there is C++/V8 object marshalling overhead with simdjson, I expect that V8's native JSON.parse() code could outperform it when the JSON string is less than a certain length.

This length threshold could be determined through benchmarking. Meteor's code could then do conditional string length check to determine which one should be used for any given JSON string.

mitar commented 4 years ago

Yes, native JSON.parse is pretty fast. I think this might be premature optimization.

vlasky commented 4 years ago

@mitar I don't think it's premature at all. In all these years Meteor has been around, it hasn't been previously considered. The event loop is the most important place to save CPU cycles and RAM.

When you have an app servicing lots of method/API calls and performing lots of database I/O, this all adds up very quickly.

Benchmarks conducted by third parties suggest that V8's JSON.parse() is vastly inferior to the best C++ implementations.

Here are some that I have found:

  1. https://github.com/GoogleChromeLabs/json-parse-benchmark

In this test, parsing an 8.2MB string literal took V8 approximately 14 seconds.

  1. https://github.com/miloyip/nativejson-benchmark

RapidJSON (the fastest JSON parser at the time, which is slower than simdjson) took 8ms to parse 4.5MB of sample JSON data, compared to 53ms for V8's.

Also, V8 consumed about 3.3x the RAM that RapidJSON did during the parse - 15.9MB vs 4.8MB.

mitar commented 4 years ago

Sure, but were those C++ implementations accessible from node process? So the question is how fast things are once you embed them inside node and access them from there?

From my experience bottleneck is not JSON, but EJSON. Especially EJSON.clone which is being done a lot everywhere in Meteor code.

vlasky commented 4 years ago

Here are the published benchmarks for using simdjson within the node process via the binding simdjson_nodejs. To me, these results are conclusive.

https://github.com/luizperes/simdjson_nodejs#benchmarks

hexsprite commented 4 years ago

Seems like there are some still potential performance issues depending on use case. But also looks like the maintainers are looking into addressing it.

https://github.com/luizperes/simdjson_nodejs/issues/28

linegel commented 4 years ago

Encountered really big issues with EJSON.clone on client-side since we were relying on a parsing of JSON data in real-time and within JSON we had stored some data (like Uint8Array) which isn't allowed by default in JSON. Considering usage https://github.com/simdjson/simdjson as included within Meteor and, as a bonus, the possibility to use it from native process for electron application could be a good addition for Meteor.