maciejhirsz / json-rust

JSON implementation in Rust
Apache License 2.0
566 stars 63 forks source link

Ropes as an faster alternative to strings? #55

Closed StefanoD closed 8 years ago

StefanoD commented 8 years ago

Actually I have no time to investigate more in ropes, because of exams. I just catched this and may be you are interested in.

Paper:

https://www.cs.rit.edu/usr/local/pub/jeh/courses/QUARTERS/FP/Labs/CedarRope/rope-paper.pdf

rust implementation:

https://github.com/google/xi-editor/tree/master/rust/rope/src

C++ implementation:

https://github.com/ivmai/bdwgc/

Interview: http://www.newrustacean.com/show_notes/interview/_2/part_2/

maciejhirsz commented 8 years ago

That does look interesting, but I don't think I can get much of use for it in JSON. The string parsing and serialization is, by now, very very fast. I do suffer from allocation on parsing, but that cost seems to be fairly low (I tried using fixed-size stack allocated buffers as an alternative to object keys, and the gains weren't impressive enough to justify extra work and loss of ergonomics that a simple String provides).

In vast majority of cases I know exactly how long it will be. In fact any string that doesn't contain escaped characters, such as "abc" is made into a slice directly from the input JSON string and from there made into an owned string.

The only place where I do have a growing buffer is in serialization, and while it is a pain there, I do allocate a sufficiently large buffer at the beginning which (due to how Rust internals work) doubles in size every time it's cap is reached, and I'm not sure ropes could help me there as I'd still have to allocate the next cord, and then the next, and so on. The gain there is that you can avoid copying the entire buffer up to the reallocation, but then there is the overhead of handing the rope and indexes and what happens when I want to serialize something that doesn't fit into the current cord and so on.

I'll close the issue for now. As intriguing of an idea as it is, I think there still are other vectors from which I can improve performance, and I should spend more time trying to increase the spec conformance numbers and making the API nicer and this point :).