pistacheio / pistache

A high-performance REST toolkit written in C++
https://pistacheio.github.io/pistache/
Apache License 2.0
3.12k stars 690 forks source link

Simdjson as a Replacement for the Semi-Abandoned RapidJSON Library #1116

Open dkierner-dh opened 1 year ago

dkierner-dh commented 1 year ago

Unreliable Release Schedule of RapidJSON and the Debianization Problem

With RapidJSON's unreliable release schedule of not releasing a new version in six years, continuing to use RapidJSON is becoming a future productivity risk and possible hindrance to this project's development.

This becomes especially obvious when considering that Pistache is aiming for debianization and the Debian version of RapidJSON is six years behind its master branch, therefore basically abandoned in the Debian repository.

RapidJSON's Replacement With Simdjson

I therefore suggest for Pistache to switch to simdjson, a library that can even beat RapidJSON and yyjson in terms of performance by using SIMD instructions where available. This library is also available as componentized Debian packages, so the switch shouldn't be too hard.

The Looming Feature Freeze of Debian 12 "Bookworm"

As the feature freeze for Debian Bookworm is already underway (Soft Freeze: 2023-02-12, Hard Freeze: 2023-03-12), I consider this a high priority request, if you want to get an updated release of Pistache into Debian 12 "Bookworm", which is also the reason why I'm tagging you here (@kiplingw, @Tachi107). The next opportunity for an updated Debian release will not be until Debian 13 "Trixie" in around two years, or as a backported release.

Tasklist

Tachi107 commented 1 year ago

Hi @dkierner-dh, thank you very much for your detailed analysis. Unfortunately I've been quite busy in the last week, and I haven't been able to properly reply before.

Pistache barely made it into Debian 12, which I see as a great success! But, as you said, the freeze is now begun, and we cannot swap the JSON dependency any more there. Still, Debian 12 does provide RapidJSON, so is this that big of an issue?

Not only that, but Pistache's reliance on RapidJSON is fairly minimal. It's only used in a Swagger thing I've never personally touched.

That being said, I too dislike having to depend on RapidJSON, and I agree that simdjson would be a nice alternative, having used it before.

dkierner-dh commented 1 year ago

Hi @Tachi107.

Pistache barely made it into Debian 12, which I see as a great success! But, as you said, the freeze is now begun, and we cannot swap the JSON dependency any more there. Still, Debian 12 does provide RapidJSON, so is this that big of an issue?

The availability in the Debian repositories does alleviate the some of issues that RapidJSON facing, like broken builds on the master branch.

I was more thinking of potentially obscure and/or minor bugs that could linger in that old version and are fixed in the newer versions. When evaluating a framework to use, you also evaluate its dependencies and a semi-abandoned dependency doesn't leave too good of an impression. The biggest issue are the fairly long release schedules (~two years) between Debian versions, paired with the fact that the latest Debian version of RapidJSON v1.1.0 is currently seven years old and will be nine years old when Debian 13 Trixie is released.

Migrating away from RapidJSON would be the best in the long run, as there are also issues to request a newer JSON schema version, which could become a hindrance with new Swagger/OpenAPI versions:

I didn't find similar issues in simdjson.

Will there be a backported version in the future, should Pistache migrate to simdjson?

kiplingw commented 1 year ago

Hey @dkierner-dh. Thank you for the suggestion. I wasn't aware of the simdjson library and find it interesting that they found a way to leverage SIMD to optimize what is effectively a string parsing library. I think that's a great idea.

I also think it would be a good idea for the reasons already discussed to migrate Pistache's RapidJSON dependency to simdjson. Thankfully, as @Tachi107 pointed out, there isn't much code that's dependent on RapidJSON IIRC.

Would you be interested in submitting a PR for this?

dkierner-dh commented 1 year ago

Hey @kiplingw, @Tachi107, I've had a look at it. The library is just as it's advertised: "Parsing gigabytes of JSON per second" and no building of JSON. There are multiple open issues for this:

I'm sorry, I wasn't aware that simdjson only provides parsing for now, when suggesting this enhancement. I would have assumed simdjson would provide both, given that reading and serializing JSON is often done in conjunction.

Should we rename this issue or put it on hold until simdjson gets such a functionality?

kiplingw commented 1 year ago

I think putting it on hold for the time being is reasonable.