plokhotnyuk / jsoniter-scala

Scala macros for compile-time generation of safe and ultra-fast JSON codecs + circe booster
MIT License
747 stars 99 forks source link

Is there a faster way to go from json bytes to formatted output? #1199

Open mjsmith707 opened 1 month ago

mjsmith707 commented 1 month ago

I have an interesting usecase where I'm deserializing a JSONB column from Postgres into a class object. In most cases however I'm just sending these as a CRUD API output over http. So it would be faster if I could skip the deserialization step and just go straight to writing the bytes instead. As the bytes are already known to be valid json (via postgres), then reparsing them isn't really needed, rather, just having the output formatted correctly is the important part. I've also experimented with just deserializing to Circe's JSON and reserializing that and it was slower than just using the class's codec.

I wrote a lazy codec (contains either type T or an Array[Byte]) and then in that I wrote a utility to scan across the bytes and then call JsonWriter's methods like out.writeVal or out.writeObjectStart() etc. That all works but it's pretty slow versus the serialization generated by jsoniter (7s vs 14s on my machine in 10m iterations). It is however, still faster than deserializing to the class then reserializing again (23s vs 14s). The CharArrayJsoniterWriter was kind of hacked up so there's probably bugs/room for improvements.

Any thoughts on this? It would be great if JsonWriter had an out.writeFormattedBytes method :sweat_smile:

Here's my char writer and LazyJson class https://gist.github.com/mjsmith707/9bdf76091da4bd324308b70e9638e5a8

plokhotnyuk commented 1 month ago

@mjsmith707 Do you mean pretty-printing on flight without parsing and validation of parsed values?

Could you please add tests with some possible inputs and expected outputs?

mjsmith707 commented 1 month ago

@plokhotnyuk Thanks for the quick reply. I don't have any non-work related sample data but any old class data will suffice I suppose. The LazyJson class has a pair of constructors, one for the actual class and another for just the json string byte array representation of it (which is assumed to be valid). When the codec's encodeValue is invoked, it calls that CharArrayJsoniterWriter which loops through the bytes then calls the various methods on JsonWriter like writeObjectStart() etc. to format it properly.

Here's a (hopefully) more fully fleshed out example. Note that the CharArrayJsoniterWriter is in Java, the rest is Scala: https://gist.github.com/mjsmith707/81908f7523b380a00697f0dd81b75ca8

Basically the example output is the same as if you were to just use the regular codec, a formatted JSON string. The difference here was I didn't need to deserialize it to MyTestClass first. Instead I just carried in the byte array (which in my case would come from Postgres as a JSONB column).

plokhotnyuk commented 1 month ago

The provided code snippet can behave differently depending on how it was called an what input was provided.

You can try to run your benchmarks with async-profiler and build flame graphs for CPU cycles and allocations to see what is happening under the hood.

A better option would be converting your benchmarks to run under sbt-jmh plugin. In the README page of this project you can find a lot of command samples to run JMH benchmarks using different profilers.

My bet that in your case jsoniter-scala spend much less time on allocations during serialization from case classes.

mjsmith707 commented 1 month ago

Right this is more of a proof of concept/experiment than anything and not very optimized. From some simple testing (i.e. not using jmh) I found the jsoniter generated codec to be roughly 2x faster at serialization than my experiment. I guess for the purposes of this issue, it is more of a feature request for a way to write a json byte array (or raw json string even) as a properly formatted json string quickly as part of the JsonWriter API