mre / hyperjson

🐍 A hyper-fast Python module for reading/writing JSON data using Rust's serde-json.
Apache License 2.0
504 stars 40 forks source link

Zero-copy string deserialization #67

Open mre opened 5 years ago

mre commented 5 years ago

Over on Reddit, @mikeyhew mentioned that there might be an option to parse JSON strings without copying:

Just wanted to point out that serde-json isn't zero-copy because it will copy strings to turn escape sequences like "\n" and "\" into the character they represent. To parse JSON without copying, you could make a custom string type, JsonStr, which is utf-8 like str but can contain escape sequences.

I forgot about that, but it's actually a great idea! Here's the upstream discussion on serde-json. We should give this custom string type some serious consideration, as string allocation takes a big part of the encoding/decoding process at the moment.

If anyone wants to give it a shot, go for it.

mikeyhew commented 5 years ago

I don't think it was originally my idea, but thanks for the mention.

Here is a proof-of-concept, with a basic test: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=130c6682231d50c744b96f4c8e2ebd43. I used the diagrams on https://json.org as a reference

EDIT: here's an updated version with some more tests and a link to the gist https://play.rust-lang.org/?gist=334122cd0104ad3509388074be4351ba