luizperes / simdjson_nodejs

Node.js bindings for the simdjson project: "Parsing gigabytes of JSON per second"
https://arxiv.org/abs/1902.08318
Apache License 2.0
549 stars 25 forks source link

Slower than JSON.parse #28

Open dalisoft opened 4 years ago

dalisoft commented 4 years ago

Hi @luizperes

I know this library was made to handle large JSON files, but there i occurred to some performance stranges when tried to parse my json and benchmarked, this library is slow, up to 6-x slower.

Here result:

json.parse - large: 985.011ms
simdjson - large: 4.756s

Also, lazyParse does not return expected result for me or i'm doing something wrong, and even using lazyParse, performance still slow. How we can improve this?

Code to test

const simdJson = require("simdjson");

const bench = (name, fn) => {
  console.time(name);
  for (let i = 0; i < 200000; i++) {
    fn();
  }
  console.timeEnd(name);
};

// Create large JSON file
let JSON_BUFF_LARGE = {};
for (let i = 0; i < 20; i++) { // 20 is close to 0.5Kb which very small, but you can increase this value
  JSON_BUFF_LARGE["key_" + i] = Math.round(Math.random() * 1e16).toString(16);
}
JSON_BUFF_LARGE = JSON.stringify(JSON_BUFF_LARGE);

console.log(
  "JSON buffer LARGE size is ",
  parseFloat(JSON_BUFF_LARGE.length / 1024).toFixed(2),
  "Kb"
);

bench("json.parse - large", () => JSON.parse(JSON_BUFF_LARGE));
bench("simdjson - large", () => simdJson.parse(JSON_BUFF_LARGE));
dalisoft commented 4 years ago

Tested on AVX supported device too

Benchmark

luizperes commented 4 years ago

Hi @dalisoft, simdjson.parse is always slower as expected. Please take a look at issue #5 and the Documentation.md file.

Thank you so much for making your tests available to me so I could save some time. As you mentioned, simdjson is not doing better than the standard JSON for your case. It could be that there is something related to parsing numbers (which is a little slow in simdjson, but it should still be faster). In your case you are generating random numbers and your json string JSON_BUFF_LARGE could probably be difficult for parsing (in simdjson), but it shouldn't be the case. I am speculating that it could be a problem with the wrapper (only if there is something very wrong) or some sort of bug on the upstream (explanation below).

I changed the parameters of the code you asked me to test. Instead of a 0.5Kb file, I am using a +25kb file. just change replace i < 20 with i < 200000.

For all three functions of simdjson, here is the output I get (my machine is an AVX2):

simdjson.parse
JSON buffer LARGE size is  25.78 Kb
json.parse - large: 33672.126ms
simdjson - large: 159626.570ms
simdjon.lazyParse

For this case, lazyParse is faster (by around 30%) than the standard JSON.

JSON buffer LARGE size is  25.81 Kb
json.parse - large: 33321.596ms
simdjson - large: 21988.679ms
simjson.isValid

isValid is nearly the same thing as lazyParse, as lazyParse only validates the json but does not construct the JS object, so they both should be running in around the same speed. I will check that to see if this is a problem in the wrapper (likely) or the upstream (by running it without the wrapper and getting its perf stat)

JSON buffer LARGE size is  25.80 Kb
json.parse - large: 33484.594ms
simdjson - large: 5665.534ms

One interesting thing that you will see with simdjson is that it scales well and becomes much faster than regular state-machine-parsing algorithms. But as stated above, there is something wrong going on. I will only have time to check around the third week of April.

Thanks again for the contribution!

cc @lemire

luizperes commented 4 years ago

Oh, here is the usage of lazyParse:

const simdjson = require('simdjson');

const jsonString = "{ \"foo\": { \"bar\": [ 0, 42 ] } }";
const JSONbuffer = simdjson.lazyParse(jsonString); // external (C++) parsed JSON object
console.log(JSONbuffer.valueForKeyPath("foo.bar[1]")); // 42

See that it does not construct the actual JS object, it keeps an external pointer to the C++ buffer and for this reason, you can only access the keys with the valueForKeyPath function that is returned in the object.

lemire commented 4 years ago

It could be that there is something related to parsing numbers (which is a little slow in simdjson, but it should still be faster)

It is generally a challenging task but simdjson should still be faster than the competition.

lemire commented 4 years ago

I know this library was made to handle large JSON file

The simdjson library itself is faster even on small files.

lemire commented 4 years ago

cc @croteaucarine @jkeiser

dalisoft commented 4 years ago

Is i'm doing something wrong or at cost of bindings performance isn't what i want.

@luizperes

const simdjson = require('simdjson');

const jsonString = "{ \"foo\": { \"bar\": [ 0, 42 ] } }";
const JSONbuffer = simdjson.lazyParse(jsonString); // external (C++) parsed JSON object
console.log(JSONbuffer.valueForKeyPath("foo.bar[1]")); // 42

I see it's good, but i'm using for other case

const simdjson = require('simdjson');

// some code
// all below code is repeated a lot of times
const JSONbuffer = simdjson.lazyParse(req.body); // req.body - JSON string
console.log(JSONbuffer.valueForKeyPath("")); // To get all object

I want use this library within my backend framework for node.js as JSON.parse alternative for higher performance, but performance only slow downing

luizperes commented 4 years ago

Hi @dalisoft, I see your point now. There are only a few cases where you actually need the whole json, but for the root case, it should have the same performance for this wrapper.

I will think of new approaches to improve the library but will only be able to do it in the future. I also will take a close look to the repo https://github.com/croteaucarine/simdjson_node_objectwrap @croteaucarine. She’s working on improvements for the wrapper. I will leave this issue open until we fix it. Hopefully it won’t take (that) long. Cheers!

dalisoft commented 4 years ago

@luizperes Thanks, i'll wait :)

luizperes commented 4 years ago

Note to self: there are a few leads on PR #33

xamgore commented 3 years ago

@luizperes did you consider the way node-expat has chosen?

luizperes commented 3 years ago

@xamgore can you elaborate your question?

dalisoft commented 3 years ago

@luizperes Hi For better debugging you can try https://github.com/nanoexpress/json-parse-expirement

xamgore commented 3 years ago

@luizperes with node-expat you can add js callbacks for events like "opening tag", "new attribute with name x", etc, so only the required properties are picked, copied and passed back to the javascript thread.

It's a contrary to the method of a smart proxy object, and still doesn't require a big amount of data to be passed between the addon and v8.

RichardWright commented 2 years ago

So just to confirm, if you want to get the entire object from a string(eg lazy usage isn't possible), this probably isn't the library to use in it's current state?

lemire commented 2 years ago

@RichardWright I cannot speak specifically for this library but one general concern is that constructing the full JSON representation in JavaScript, with all the objects, strings, arrays... is remarkably expensive. In some respect, that's independent from JSON.

cc @jkeiser

RichardWright commented 2 years ago

Passing around a buffer and using key access is the preferred method then?

luizperes commented 2 years ago

That is correct @RichardWright. My idea, as mentioned in other threads, would be to have simdjson implemented as the native json parser directly into the engine e.g V8. That would possibly speed up the process. At this moment I am finishing my masters thesis and don't have time to try it, so we will have to wait a little bit on that. :)

CC @jkeiser @lemire

Uzlopak commented 2 years ago

CC @mcollina

RichardWright commented 2 years ago

@luizperes cool, thanks for the response!