luizperes / simdjson_nodejs

Node.js bindings for the simdjson project: "Parsing gigabytes of JSON per second"
https://arxiv.org/abs/1902.08318
Apache License 2.0
549 stars 25 forks source link

Reporting timings in GB/s #21

Closed lemire closed 4 years ago

lemire commented 5 years ago

It would be more interesting if we could report the timings (in the README.md) in GB/s, or some other normalized metric.

dalisoft commented 4 years ago

Maybe in HPS (Handle-per Second) or PPS (Parse-per-Second)?

luizperes commented 4 years ago

@dalisoft The original repo already does GB/s. Also it's how we usually evaluate streaming/text processing on papers, that is why he mentions GB/s.

dalisoft commented 4 years ago

Would be nice to see benchmark, i know only Request/sec or calls per second. About GB/sec i see, but never realized in benchmarks

lemire commented 4 years ago

You have to define the payload for requests per second to be meaningful. Parsing an empty JSON document can be done an infinite number of times per second.

dalisoft commented 4 years ago

@lemire Yes, i understand. 3-4 types of JSON payloads.

  1. { "status": "success" } - One-line simple responses
  2. { "status": "success", "data": { "id": "uuiv4", ...other_datas } } - User-data or simple query responses (max 1Kb)
  3. { ...type2, "ref": { ...$ref_data, ...data_too_here } } - Structured data of users with their references (~3-4K)
  4. Large payload - Basically more than 10KB

I can benchmark with these type of datas if necessary

lemire commented 4 years ago

@dalisoft

If the document is small enough, we reach millions of documents parsed per second. This was tested upstream. Our benchmarking tools now report the number of documents parsed per second.

luizperes commented 4 years ago

PR #33 closes this issue