luizperes / simdjson_nodejs

Node.js bindings for the simdjson project: "Parsing gigabytes of JSON per second"
https://arxiv.org/abs/1902.08318
Apache License 2.0
549 stars 25 forks source link

Keep the parser alive between parses #35

Open jkeiser opened 4 years ago

jkeiser commented 4 years ago

The simdjson parser has some allocation to do when it is initialized, and currently simdjson_nodejs recreates the parser every single time.

We could use napi's napi_set/get_instance_data to store an instance of the parser for each JS worker thread, keeping that internal memory around. We would have to extract the document from the parser at the end of the parse with std::move, but it would get rid of the internal buffer allocation and even keep them hot--which seems likely to more than counterbalance any performance degradation we'd get from said extraction.

If we could tell whether the user kept any instances of the document around when they call parse() again, we could even use a copy-on-write scheme and only move the document away if the user calls parse() again while there are still live document instances in JS (many people just read a doicument extract what they want, and throw away the JSON before doing anything else, so it would be a win for a lot of cases)..

@lemire ^^ relevant to simdjson's ability to work well with bindings.

lemire commented 4 years ago

@jkeiser I think we need to relate this to the following upstream issue... https://github.com/simdjson/simdjson/issues/94