plokhotnyuk / jsoniter-scala

Scala macros for compile-time generation of safe and ultra-fast JSON codecs + circe booster
MIT License
741 stars 99 forks source link

leverage SIMD #650

Open LifeIsStrange opened 3 years ago

LifeIsStrange commented 3 years ago

SIMD allow revolutionary intra-core parallelism. Actually the fastest Json library on earth is called simdjson for this precise reason. openjdk 16 release next month and bring SIMD support to the JVM! https://openjdk.java.net/jeps/338 You could hence use it that way (or through the intriguing https://github.com/beehive-lab/TornadoVM )

exciting isn't it ? :) @plokhotnyuk

plokhotnyuk commented 3 years ago

I think that SIMD would work greatly only for payload with long strings.

A better option would be an adoption of SWAR techniques using 64-bit or 128-bit values, like here.

LifeIsStrange commented 3 years ago

Wow interesting!!

ScalaWilliam commented 3 years ago

@plokhotnyuk that looks super super cool.

I am a little naive in C, still, what is the general approach the author is approaching, from a Scala perspective (if this is possible to explain at all)?

I did not know there was something that could be even faster than simdjson!

plokhotnyuk commented 3 years ago

@ScalaWilliam Before comparing the speed need to understand that the result of simdjson parsing is just an iterator over indexed JSON input, and not a data structure with arbitrary access to fields (values or references that accessible by offsets) as it usually happening in the Scala world.

Sometime creation of the data structure in Scala takes more CPU cycles than parsing of included values from JSON. Most expensive are instances of immutable collections, boxed primitives and Option[_] wrappers.

ScalaWilliam commented 3 years ago

@plokhotnyuk thank you so much for your explanation - sorry I realised I did not say that intended to refer to yyjson. This library seems to extract everything out into immutable structures.

plokhotnyuk commented 3 years ago

@ScalaWilliam While yyjson allocates immutable nodes of the JSON object model the main CPU cycles will be spent during accessing to it when searching for tagged values that could be quite expensive for real world JSON messages.

Parsers from Scala world that provide object model for JSON usually use maps and vectors for JSON objects and JSON arrays accordingly that is much more expensive then parsing immediately to data structures and arrays.

plokhotnyuk commented 2 years ago

@LifeIsStrange @ScalaWilliam

Latest versions of jsoniter-scala-core for JVM use SWAR techniques for parsing and serialization of ASCII strings, booleans, numbers, java.time._ and java.util.UUID values.

It gives speed up to x2 for some cases.

Latest results are published here, as usually: https://plokhotnyuk.github.io/jsoniter-scala/

plokhotnyuk commented 1 year ago

@LifeIsStrange @ScalaWilliam Currently, jsoniter-scala seems to be quite competitive with simdjson-java when full validation against the JSON spec is not required for skipped keys and values: https://github.com/simdjson/simdjson-java/pull/2