luizperes / simdjson_nodejs

Node.js bindings for the simdjson project: "Parsing gigabytes of JSON per second"
https://arxiv.org/abs/1902.08318
Apache License 2.0
554 stars 25 forks source link

Parsing time is dominated by makeJSONObject (in the wrapper) #5

Closed lemire closed 5 years ago

lemire commented 5 years ago

At the C++ level, isValid and parse do exactly the same work. This is obvious from the code in this wrapper:

bool simdjson::isValid(std::string_view p) {
  ParsedJson pj = build_parsed_json(p);
  return pj.isValid();
}

Versus

Napi::Object simdjson::parse(Napi::Env env, std::string_view p) {
  ParsedJson pj = build_parsed_json(p);
  if (!pj.isValid()) {
    Napi::Error::New(env, "Invalid JSON Exception").ThrowAsJavaScriptException();
  }
  ParsedJson::iterator pjh(pj);
  return simdjson::makeJSONObject(env, pjh).As<Napi::Object>();
}

However, in simdjson_nodejs, parse is at least 20x slower than isValid. This indicates that the running time is entirely dependent on makeJSONObject.

This function is written in a sensible manner but it is nevertheless quite slow. Evidently, this defeats the purpose of the fast parsing.

I would say it is a priority to fix this performance issue. It seems that there would be different valid ways to go about it.

lemire commented 5 years ago

Evidently, long term the solution might be to avoid makeJSONObject and bring the computation to C++, instead of bringing the data to JavaScript.

luizperes commented 5 years ago

Hi @lemire, you are completely right. I verified that it was the case when I was creating the benchmarks for this project and this is the reason why in the benchmark file I use the isValid function to make a fair comparison. I did not open the issue - should've done it - but I had this is mind and just have put that as the highest priority for this project. As you mentioned, performance is the whole purpose.

This problem can be related to Node.js Napi. Napi conversion's overhead seems to be causing it. It seems that we don't need to use Napi, so we might be able to compare it with Google's V8 and/or Mozilla's SpiderMonkey.

According to here, V8 should be a better option, but I believe that we could also add them in benchmarks in the future. However, I am not so sure if it is worth it to keep them as the projects dependencies, so this is something we will need to think carefully.

Other than that, I see what you mean regarding bringing the computation to C++, maybe we could even add both parsing options, where one would be converting to a JS object (20x slower) and the other would be to use the C++ output. I will also study the possibility of lazily loading/converting the C++ output to a Javascript object. In spite of the way we will implement it, my goal is to keep as simple and as JS-feel as possible. The JS-feel that I mentioned would be that Javascript friendly way of dealing with objects.

I will start working on that as soon as possible! Thank you so much for the help!

lemire commented 5 years ago

It might be possible to keep your design but just find a way to batch it so that it runs faster. I am aware that this is not trivial.

TkTech commented 5 years ago

We have the same problem in pysimdjson where the benchmarks are dominated by the time taken to convert into high-level Python objects. I've been implementing jq-like syntax to avoid creating unnecessary objects and allowing the caller to filter to a subset of the document. I'd like to extend it to allow manipulation of the object as well, but that will take some time to iron out.

It's a simple, ugly state machine, and is not optimized at all, but again even then the time is dominated by object creation. No part of it is Python-specific, so it may be an option for you as well.

luizperes commented 5 years ago

Hi @TkTech, I've been looking at your project, it seems very complete. I used your project structure and README file as the start point for the simdjson_nodejs. I'll surely look at your code when I take this issue to solve (I am currently trying to solve it for multiple architectures #6).

It is good to know that you are having a similar problem, so we can update each other in case of success.

Thanks for contributing in this discussion!

luizperes commented 5 years ago

Hi @lemire, I was working on #7 and decided to create some benchmarks for Napi against Nan (Google v8) and got interesting results:

Benchmarks for Napi using only the isValid method:

apache_builds.json#simdjson x 7,195 ops/sec ±1.17% (88 runs sampled) => 0.0001389790584036548
apache_builds.json#JSON x 1,255 ops/sec ±2.59% (84 runs sampled) => 0.0007966620520429015
canada.json#simdjson x 211 ops/sec ±2.74% (83 runs sampled) => 0.004743973159827189
canada.json#JSON x 38.60 ops/sec ±4.09% (51 runs sampled) => 0.02590397239215687
citm_catalog.json#simdjson x 111 ops/sec ±1.34% (72 runs sampled) => 0.008975157412037041
citm_catalog.json#JSON x 76.20 ops/sec ±2.61% (66 runs sampled) => 0.013122753918181821
github_events.json#simdjson x 3,077 ops/sec ±1.82% (88 runs sampled) => 0.00032503186619561327
github_events.json#JSON x 1,386 ops/sec ±3.51% (83 runs sampled) => 0.0007213715816940607
gsoc-2018.json#simdjson x 280 ops/sec ±2.13% (83 runs sampled) => 0.0035660336778112456
gsoc-2018.json#JSON x 57.39 ops/sec ±2.13% (60 runs sampled) => 0.01742347150555556
instruments.json#simdjson x 4,156 ops/sec ±0.77% (93 runs sampled) => 0.00024058727887435222
instruments.json#JSON x 1,042 ops/sec ±1.71% (90 runs sampled) => 0.0009593169606922395
marine-ik.json#simdjson x 158 ops/sec ±0.58% (80 runs sampled) => 0.0063362718916666685
marine-ik.json#JSON x 44.10 ops/sec ±1.20% (58 runs sampled) => 0.022673223390804596
mesh.json#simdjson x 632 ops/sec ±1.08% (87 runs sampled) => 0.0015814252134260596
mesh.json#JSON x 188 ops/sec ±1.68% (79 runs sampled) => 0.005305818097928655
mesh.pretty.json#simdjson x 390 ops/sec ±0.90% (87 runs sampled) => 0.002563225901495117
mesh.pretty.json#JSON x 134 ops/sec ±1.70% (75 runs sampled) => 0.007440451715000001
numbers.json#simdjson x 3,563 ops/sec ±0.57% (93 runs sampled) => 0.00028067975514213485
numbers.json#JSON x 890 ops/sec ±2.12% (81 runs sampled) => 0.0011234899327846363
random.json#simdjson x 285 ops/sec ±1.97% (84 runs sampled) => 0.00350299118331916
random.json#JSON x 115 ops/sec ±2.42% (73 runs sampled) => 0.008677703538160473
twitter.json#simdjson x 324 ops/sec ±0.67% (89 runs sampled) => 0.003089232071381362
twitter.json#JSON x 158 ops/sec ±1.90% (79 runs sampled) => 0.006319421839662448
twitterescaped.json#simdjson x 1,052 ops/sec ±0.48% (92 runs sampled) => 0.0009503572300291675
twitterescaped.json#JSON x 351 ops/sec ±1.31% (87 runs sampled) => 0.0028453512062915905
update-center.json#simdjson x 366 ops/sec ±0.80% (87 runs sampled) => 0.002731871923563218
update-center.json#JSON x 123 ops/sec ±2.41% (74 runs sampled) => 0.008144905070945946
filename JSON file simdjson file
apache_builds.json.json 0.0007966620520429015 0.0001389790584036548
canada.json.json 0.02590397239215687 0.004743973159827189
citm_catalog.json.json 0.013122753918181821 0.008975157412037041
github_events.json.json 0.0007213715816940607 0.00032503186619561327
gsoc-2018.json.json 0.01742347150555556 0.0035660336778112456
instruments.json.json 0.0009593169606922395 0.00024058727887435222
marine-ik.json.json 0.022673223390804596 0.0063362718916666685
mesh.json.json 0.005305818097928655 0.0015814252134260596
mesh.pretty.json.json 0.007440451715000001 0.002563225901495117
numbers.json.json 0.0011234899327846363 0.00028067975514213485
random.json.json 0.008677703538160473 0.00350299118331916
twitter.json.json 0.006319421839662448 0.003089232071381362
twitterescaped.json.json 0.0028453512062915905 0.0009503572300291675
update-center.json.json 0.008144905070945946 0.002731871923563218

Benchmarks for Napi using parsing the output object to native JS object:

apache_builds.json#simdjson x 438 ops/sec ±0.66% (89 runs sampled) => 0.0022829753812452806
apache_builds.json#JSON x 1,437 ops/sec ±1.05% (87 runs sampled) => 0.0006958411113316021
canada.json#simdjson x 17.05 ops/sec ±3.31% (40 runs sampled) => 0.058656786950000005
canada.json#JSON x 40.14 ops/sec ±3.47% (54 runs sampled) => 0.024911719324074072
citm_catalog.json#simdjson x 39.76 ops/sec ±1.05% (52 runs sampled) => 0.02515266741025641
citm_catalog.json#JSON x 77.57 ops/sec ±2.17% (66 runs sampled) => 0.012891161212121215
github_events.json#simdjson x 700 ops/sec ±0.52% (91 runs sampled) => 0.0014280848411625908
github_events.json#JSON x 1,598 ops/sec ±1.66% (89 runs sampled) => 0.0006256597082786129
gsoc-2018.json#simdjson x 17.66 ops/sec ±1.53% (47 runs sampled) => 0.056629169744680845
gsoc-2018.json#JSON x 54.59 ops/sec ±2.20% (57 runs sampled) => 0.018319231489766073
instruments.json#simdjson x 244 ops/sec ±1.87% (81 runs sampled) => 0.004102743795414463
instruments.json#JSON x 986 ops/sec ±1.89% (85 runs sampled) => 0.0010144264163895345
marine-ik.json#simdjson x 12.66 ops/sec ±1.46% (36 runs sampled) => 0.07900105983333333
marine-ik.json#JSON x 40.90 ops/sec ±1.83% (54 runs sampled) => 0.02445177696604938
mesh.json#simdjson x 71.61 ops/sec ±0.78% (73 runs sampled) => 0.013963813976027396
mesh.json#JSON x 193 ops/sec ±1.49% (80 runs sampled) => 0.005182644167045457
mesh.pretty.json#simdjson x 62.65 ops/sec ±0.73% (65 runs sampled) => 0.01596163256153847
mesh.pretty.json#JSON x 137 ops/sec ±1.26% (76 runs sampled) => 0.007314450536184208
numbers.json#simdjson x 565 ops/sec ±0.95% (87 runs sampled) => 0.0017705686135335555
numbers.json#JSON x 972 ops/sec ±1.42% (86 runs sampled) => 0.0010288740936569055
random.json#simdjson x 56.92 ops/sec ±1.00% (59 runs sampled) => 0.01756998482062147
random.json#JSON x 114 ops/sec ±2.49% (72 runs sampled) => 0.008809404929563494
twitter.json#simdjson x 66.72 ops/sec ±0.78% (68 runs sampled) => 0.01498742863602941
twitter.json#JSON x 153 ops/sec ±1.84% (77 runs sampled) => 0.006541315509379511
twitterescaped.json#simdjson x 85.64 ops/sec ±0.66% (73 runs sampled) => 0.011677340873972602
twitterescaped.json#JSON x 348 ops/sec ±1.35% (87 runs sampled) => 0.0028712606250252057
update-center.json#simdjson x 58.38 ops/sec ±0.85% (62 runs sampled) => 0.01712954294892473
update-center.json#JSON x 127 ops/sec ±2.38% (73 runs sampled) => 0.007865323219911938
filename JSON file simdjson file
apache_builds.json.json 0.0006958411113316021 0.0022829753812452806
canada.json.json 0.024911719324074072 0.058656786950000005
citm_catalog.json.json 0.012891161212121215 0.02515266741025641
github_events.json.json 0.0006256597082786129 0.0014280848411625908
gsoc-2018.json.json 0.018319231489766073 0.056629169744680845
instruments.json.json 0.0010144264163895345 0.004102743795414463
marine-ik.json.json 0.02445177696604938 0.07900105983333333
mesh.json.json 0.005182644167045457 0.013963813976027396
mesh.pretty.json.json 0.007314450536184208 0.01596163256153847
numbers.json.json 0.0010288740936569055 0.0017705686135335555
random.json.json 0.008809404929563494 0.01756998482062147
twitter.json.json 0.006541315509379511 0.01498742863602941
twitterescaped.json.json 0.0028712606250252057 0.011677340873972602
update-center.json.json 0.007865323219911938 0.01712954294892473

Benchmarks for Nan (Google V8) using only the isValid method:

apache_builds.json#simdjson x 5,322 ops/sec ±0.58% (92 runs sampled) => 0.00018789851454938786
apache_builds.json#JSON x 1,384 ops/sec ±1.34% (90 runs sampled) => 0.0007225356067734202
canada.json#simdjson x 166 ops/sec ±1.16% (83 runs sampled) => 0.006025841941097727
canada.json#JSON x 43.04 ops/sec ±1.32% (56 runs sampled) => 0.023235254416666663
citm_catalog.json#simdjson x 139 ops/sec ±0.83% (78 runs sampled) => 0.007194693341346155
citm_catalog.json#JSON x 79.76 ops/sec ±2.05% (68 runs sampled) => 0.012537133623529408
github_events.json#simdjson x 4,336 ops/sec ±0.81% (93 runs sampled) => 0.00023065048538408372
github_events.json#JSON x 1,604 ops/sec ±1.10% (87 runs sampled) => 0.0006236343558971143
gsoc-2018.json#simdjson x 168 ops/sec ±0.66% (84 runs sampled) => 0.005965806858465609
gsoc-2018.json#JSON x 55.74 ops/sec ±2.51% (60 runs sampled) => 0.017939857340277778
instruments.json#simdjson x 2,928 ops/sec ±0.73% (90 runs sampled) => 0.000341537251819902
instruments.json#JSON x 1,035 ops/sec ±1.78% (89 runs sampled) => 0.000966344505140085
marine-ik.json#simdjson x 116 ops/sec ±1.63% (73 runs sampled) => 0.008624565794520549
marine-ik.json#JSON x 42.65 ops/sec ±1.50% (56 runs sampled) => 0.02344452008928571
mesh.json#simdjson x 523 ops/sec ±0.91% (89 runs sampled) => 0.0019118788956661319
mesh.json#JSON x 196 ops/sec ±1.26% (82 runs sampled) => 0.005095369881374722
mesh.pretty.json#simdjson x 254 ops/sec ±0.64% (84 runs sampled) => 0.003929379895866039
mesh.pretty.json#JSON x 139 ops/sec ±1.28% (78 runs sampled) => 0.007172208120192307
numbers.json#simdjson x 3,047 ops/sec ±0.36% (95 runs sampled) => 0.0003281855913570448
numbers.json#JSON x 955 ops/sec ±1.50% (89 runs sampled) => 0.0010468748176318064
random.json#simdjson x 479 ops/sec ±0.52% (91 runs sampled) => 0.00208574015032967
random.json#JSON x 112 ops/sec ±2.50% (71 runs sampled) => 0.008939500769282363
twitter.json#simdjson x 415 ops/sec ±1.23% (89 runs sampled) => 0.0024099802471910107
twitter.json#JSON x 151 ops/sec ±2.05% (76 runs sampled) => 0.006631303152412284
twitterescaped.json#simdjson x 808 ops/sec ±0.47% (93 runs sampled) => 0.0012375998396825397
twitterescaped.json#JSON x 335 ops/sec ±1.52% (83 runs sampled) => 0.0029858003431268935
update-center.json#simdjson x 465 ops/sec ±0.34% (91 runs sampled) => 0.0021491741295238094
update-center.json#JSON x 123 ops/sec ±2.50% (77 runs sampled) => 0.008158883020408162
filename JSON file simdjson file
apache_builds.json.json 0.0007225356067734202 0.00018789851454938786
canada.json.json 0.023235254416666663 0.006025841941097727
citm_catalog.json.json 0.012537133623529408 0.007194693341346155
github_events.json.json 0.0006236343558971143 0.00023065048538408372
gsoc-2018.json.json 0.017939857340277778 0.005965806858465609
instruments.json.json 0.000966344505140085 0.000341537251819902
marine-ik.json.json 0.02344452008928571 0.008624565794520549
mesh.json.json 0.005095369881374722 0.0019118788956661319
mesh.pretty.json.json 0.007172208120192307 0.003929379895866039
numbers.json.json 0.0010468748176318064 0.0003281855913570448
random.json.json 0.008939500769282363 0.00208574015032967
twitter.json.json 0.006631303152412284 0.0024099802471910107
twitterescaped.json.json 0.0029858003431268935 0.0012375998396825397
update-center.json.json 0.008158883020408162 0.0021491741295238094

Benchmarks for Nan (Google V8) using parsing the output object to native JS object:

apache_builds.json#simdjson x 679 ops/sec ±1.15% (90 runs sampled) => 0.001472604520007262
apache_builds.json#JSON x 1,396 ops/sec ±1.07% (89 runs sampled) => 0.0007163054532996373
canada.json#simdjson x 20.20 ops/sec ±1.28% (37 runs sampled) => 0.04951047694594596
canada.json#JSON x 42.57 ops/sec ±1.67% (56 runs sampled) => 0.023491198351190474
citm_catalog.json#simdjson x 49.75 ops/sec ±0.53% (64 runs sampled) => 0.020098972307291674
citm_catalog.json#JSON x 80.75 ops/sec ±1.84% (69 runs sampled) => 0.012384519811594209
github_events.json#simdjson x 1,217 ops/sec ±0.79% (93 runs sampled) => 0.0008217606435226479
github_events.json#JSON x 1,681 ops/sec ±0.62% (93 runs sampled) => 0.0005949090546686192
gsoc-2018.json#simdjson x 54.21 ops/sec ±0.53% (69 runs sampled) => 0.018445115888888884
gsoc-2018.json#JSON x 55.54 ops/sec ±2.64% (60 runs sampled) => 0.01800485047083333
instruments.json#simdjson x 291 ops/sec ±0.68% (85 runs sampled) => 0.003433637129411764
instruments.json#JSON x 1,006 ops/sec ±1.83% (86 runs sampled) => 0.0009944475654357068
marine-ik.json#simdjson x 14.06 ops/sec ±2.74% (38 runs sampled) => 0.07112410321052634
marine-ik.json#JSON x 42.03 ops/sec ±1.62% (55 runs sampled) => 0.023794345115151515
mesh.json#simdjson x 75.85 ops/sec ±1.58% (65 runs sampled) => 0.013183479455384615
mesh.json#JSON x 193 ops/sec ±1.13% (81 runs sampled) => 0.005189652390011223
mesh.pretty.json#simdjson x 71.35 ops/sec ±0.41% (73 runs sampled) => 0.014015100469178081
mesh.pretty.json#JSON x 137 ops/sec ±1.17% (77 runs sampled) => 0.007275473579545454
numbers.json#simdjson x 601 ops/sec ±0.29% (93 runs sampled) => 0.001663296916406521
numbers.json#JSON x 931 ops/sec ±1.63% (88 runs sampled) => 0.0010745288268493758
random.json#simdjson x 79.04 ops/sec ±0.40% (67 runs sampled) => 0.012652597855223878
random.json#JSON x 108 ops/sec ±2.19% (70 runs sampled) => 0.009232717743537415
twitter.json#simdjson x 96.65 ops/sec ±0.90% (70 runs sampled) => 0.01034642326904762
twitter.json#JSON x 154 ops/sec ±2.12% (77 runs sampled) => 0.006510784343434344
twitterescaped.json#simdjson x 116 ops/sec ±1.18% (76 runs sampled) => 0.00862335357863409
twitterescaped.json#JSON x 334 ops/sec ±1.65% (83 runs sampled) => 0.002994692009370816
update-center.json#simdjson x 98.50 ops/sec ±0.92% (73 runs sampled) => 0.010152052480821917
update-center.json#JSON x 123 ops/sec ±2.61% (71 runs sampled) => 0.008161439740945672
filename JSON file simdjson file
apache_builds.json.json 0.0007163054532996373 0.001472604520007262
canada.json.json 0.023491198351190474 0.04951047694594596
citm_catalog.json.json 0.012384519811594209 0.020098972307291674
github_events.json.json 0.0005949090546686192 0.0008217606435226479
gsoc-2018.json.json 0.01800485047083333 0.018445115888888884
instruments.json.json 0.0009944475654357068 0.003433637129411764
marine-ik.json.json 0.023794345115151515 0.07112410321052634
mesh.json.json 0.005189652390011223 0.013183479455384615
mesh.pretty.json.json 0.007275473579545454 0.014015100469178081
numbers.json.json 0.0010745288268493758 0.001663296916406521
random.json.json 0.009232717743537415 0.012652597855223878
twitter.json.json 0.006510784343434344 0.01034642326904762
twitterescaped.json.json 0.002994692009370816 0.00862335357863409
update-center.json.json 0.008161439740945672 0.010152052480821917

Generating native code in V8 is faster, but it seems that there is an initial overhead, since the isValid method was faster using Napi. I will have to think a bit more about that, but I will probably be using Napi since it is faster in the C++ and use one of the approach that the three of us have discussed. Cheers!

lemire commented 5 years ago

Another option would be not to convert the JSON data, piece by piece, into C++ and then JavaScript values. Just expose the C++ "object" by wrapping its methods. Is that practical?

luizperes commented 5 years ago

That is exactly the option I had in mind, will be working on it once I figure #7 out, cheers!

luizperes commented 5 years ago

Hi @lemire @TkTech, I think I found a way to do it. It seems it should be possible with a (sort of) object trampoline.

In this way, I can use a proxy for my object, trap the getter/setter and return whatever I want.

screen shot 2019-03-06 at 1 58 12 am

I started to implement it here https://github.com/luizperes/simdjson_nodejs/blob/improve-performance/simdjson/bindings.cpp#L111

Basically, everything I do is calling build_parsed_json and then keep a variable with the result as a buffer (the code looks ugly, I will clean it up once it works haha). Next step is to return an object (fast!!!) wrapped in a proxy and whenever the user tries to set or get a property, I will use trampolines to parse it properly and load parts of the object lazily. For example, imagine we have the input:

"widget": {
    "debug": "on",
    "window": {
        "title": "Sample Konfabulator Widget",
        "name": "main_window",
        "width": 500,
        "height": 500
    },
    "image": { 
        "src": "Images/Sun.png",
        "name": "sun1",
        "hOffset": 250,
        "vOffset": 250,
        "alignment": "center"
    }
}    

I will parse this object using simdjson and return it as a buffer wrapped inside an object trampoline. Let this object be jsonProxy. The object is empty (only has its buffer).

// jsonProxy object
{
  buffer: cpp_external_object_ptr,
  get: getTrap,
  set: setTrap
}

if the user calls the function console.log(jsonProxy.widget.window) at certain point, the app will trap the getters and setters and will load lazily only the last element (in this case window). Therefore, the object jsonProxy would have the fields:

// jsonProxy object
{
  buffer: cpp_external_object_ptr,
  get: getTrap,
  set: setTrap,
  widget: {
    image: {
      src: "Images/Sun.png",
      name: "sun1",
      hOffset: 250,
      vOffset: 250,
      alignment: "center"
    }
  }
}

See that the overhead of trapping an object only occurs once for every property (second access to the same field is very fast).

Please let me know what you think!

lemire commented 5 years ago

Assuming that it works well in benchmarks, this is a very appealing design.

luizperes commented 5 years ago

Hi @lemire, I have implemented the lazyParse as discussed before. Could you update your local copy of simdjson and check if it is okay now parsing-time-wise? Cheers!

lemire commented 5 years ago

@luizperes On my todo.

lemire commented 5 years ago

I confirm the good results. I think we can close the issue, good work!

croteaucarine commented 5 years ago

Hi @luizperes. I am currently doing a study on simdjson for my master and we are trying to improve performances of the simdjson library in high-level environments such as Node and Python. I downloaded the sources and saw what you did with lazy_parse. This is a huge advancement but I don't see the use of the Javascript Proxy in the sources. I did a little implementation in JS based on your suggestions on this issue and I would like to go further and try to find a way to return a Proxy object directly in C++ with N-API. This way, we could access values directly from object properties instead of calling the method valueForKeyPath and properties would not get loaded twice.
Do you mind if I continue my work on this issue and get to you if I find something useful? I know this issue is closed, but I think adding this feature would make it more user-friendly for programmers. I also consider the solution to add a JS layer if I can't seem to find a way to do it in C++. Of course, benchmarks will be made to make sure that the solution does not downgrade performances.

luizperes commented 5 years ago

Hi @croteaucarine, I would love if you continued working on that, I could really use another brain, since I myself did not find a very good way of extracting properties through proxies. You surely can count on me if you need anything! :) The reason why I implemented the method valueForKeyPath instead was because in the case I described to @lemire , a proxy always return an object that is also a proxy. However, take the case when we try to access the value of obj.foor.bar. At the end, bar is a proxy (but it should be ideally a value) and therefore we need a property that "extracts" that value. The latest example I had in javascript was something like:


const monster = (target) => {
  return new Proxy(target, handler)
};

const simdjson = {
  extract: (obj) => { return obj.__simdjsonvalue__ }
};

const handler = {
  get: (obj, prop) => {
    if (typeof obj[prop] === 'undefined') {
      let sprop = String(prop);
      let val = !isNaN(parseInt(sprop)) ? "[" + sprop + "]" : "." + sprop;
      obj[prop] = monster({ __simdjsonvalue__: obj.__simdjsonvalue__ + val });
      return obj[prop];
    } else {
      return obj[prop];
    }
  },
  set: (obj, prop, value) => {
    obj[prop]['__simdjsonvalue__'] = value;
    return value;
  }
};

// ---------------- EXAMPLES
let proxy = monster({ __simdjsonvalue__: "", keyForValue: (str) => str + " sss" });
console.log(proxy.j.f.__simdjsonvalue__);
console.log(simdjson.extract(proxy.j.f));
console.log(proxy.keyForValue("www"));

That was the best I could get. Once I realized that I woud need a method extract (just to hide that access to __simdjsonvalue__), I decided to create a method valueForKeyPath and not use the proxy approach at all. Also, you will see that proxies are very large and complex objects, when you inspect them, so it might have a trade-off in running time by accessing very deep "property trees" as the properties are created on the fly by the JS engine.

The next thing I thought to myself was to implement simdjson directly in the Google V8 project (and submit a patch later), meaning that we would replace the existing JSON.parse for SIMD architectures directly in the engine. (however, I have been busy with other projects and likely will not have time for it anytime soon)

I hope what I said helps you to get started and please feel more than welcome to work in this project and like I said, do ask any questions you might have, that will be a pleasure to help! Oh, and sorry for the late reply, I was traveling this past week and couln't find the time to give you a proper response. I usually try to get back to my emails within a day.

Lastly, let me know if you want me to reopen this issue and feel free to open new ones.

Luiz

lemire commented 4 years ago

@luizperes I suggest you have a look at https://github.com/croteaucarine/simdjson_node_objectwrap It seems that @croteaucarine might have a good strategy.

haoyuintel commented 2 years ago

Hi @luizperes , I am very interesting about the project of simdjson and thank you for building the bindings of node.js. I have also tested some benchmarks which I am interested in. I found the same phenomenon in issue#28, without lazyparse, the parsing speed by simdjson is slower than default json.parse. But it seems that lazyparse is a good method for parsing JSON. In this issue, I found that people discussing simdjson.lazyparse do not use makeJSONObject. And I am so curious that if lazyparse can build the JS object.

I have tried some methods in the "benchmark.js", however I failed. Maybe my understanding is not enough. Could you show me how can I check it? (for example by using json.parse for twitter.json, I can checked by following: var obj = json.parse(/path/of/twitter.json); console.log(obj.search_metadata); )

Thank you~