spotify / spotify-json

Fast and nice to use C++ JSON library.
Apache License 2.0
196 stars 40 forks source link

Using simdjson as an SAX tokenizer #64

Open michaeleisel opened 4 years ago

michaeleisel commented 4 years ago

simdjson seems to be the gold standard in terms of JSON-parsing performance. It's always being updated with state-of-the-art algorithms for parsing, makes excellent use of intrinsics, and supports both arm and x86_64. It's also in use by many different organizations and has extensive testing via fuzzing etc. . I don't know what the performance needs are for JSON parsing here at Spotify, but if there's any desire for more speed, simdjson would be a great choice. It could be used as an SAX tokenizer, or simply forked to have spotify-json's high-level API built on top of it.

punchfox commented 4 years ago

It's an interesting direction to explore. In terms of performance, we do alright on x86 platforms, but we have no optimizations for ARM platforms, which turns out to be the majority of our uses. It would be interesting to see if we could use some or all of simdjson in our parser. I don't know if the SAX parses easily slots into our code, but just replacing the string and number parsers with the ones from simdjson might be an easy performance win, and would allow us to remove some of our own code.