hashgraph / pbj

A performance optimized Google Protocol Buffers code generator, parser, and Gradle module.
Apache License 2.0
13 stars 6 forks source link

JSON Parsing is Slow #50

Open jasperpotts opened 1 year ago

jasperpotts commented 1 year ago

Problem

Parsing JSON is much slower (3.6x) in PBJ than standard Protobuf library and it is much slower than the fastest out there. There is a JMH benchmark attached to this issue to test performance vs the standard Protobuf library that can be checked into the com.hedera.pbj.intergration.jmh package.

Benchmark                                  Mode  Cnt       Score      Error  Units
JsonBench.AccountDetailsBench.parsePbj     avgt    5   48494.176 ±  118.987  ns/op
JsonBench.AccountDetailsBench.parseProtoC  avgt    5   13453.996 ±   42.288  ns/op

Solution

To get a JSON codec out the door quickly while minimizing 3rd party dependencies we use a Antlr 4 generated parser for JSON that tokenizes the input then build a parse tree of context objects. We then extract the data we need from that context object tree and build model record objects. This works fine but is slow and generates huge amounts of garbage in the form of Token and Context objects. Also Antlr starts by buffering the input and converting it into a unicode integer array which is not the fastest and requires a complete read of the stream and large memory allocation.

We should write custom parsing code or use a 3rd party streaming library. Ideally we would not add any 3rd party dependencies unless absolutely necessary. JSON is a very simple format and we should be able to hand write a simple fast parser, like the standard Google protobuf library does.

There is a benchmark for testing all the fastest Java JSON parsers. Ideally we would match the fastest for performance. Most of them are Apache license so we can use them as reference examples for how to write a fast JSON parser. Because we meed the JSON format to exactly follow the Protobuf spec rules we can not use any of these out of the box I believe. https://github.com/fabienrenaud/java-json-benchmark

jasperpotts commented 1 year ago

FASTJSON 2.0 seems to be the one to match, beat or use https://github.com/alibaba/fastjson The 3 key things we need to achieve are:

  1. As fast as possible ideal as fast or faster than protobuf
  2. The JSON should be byte for byte identical to that generated by protoc generated code
  3. There should be minimal garbage generated during parsing or output