Open ritchie46 opened 2 months ago
Hi ritchie,
by specification json objects are unordered sets but that also doesn't mean it has to be unordered. the order for the first 32 keys is an implementation detail. We use halfbrown that uses a simple vector for up to 32 elements which makes small objects, which are frequent with json, extremely fast and small (memory wise).
That's just for background :). Generally if we can make it optional and preserve the current behaviour I'm in, giving people more options is nice!
A feature to watch out for is known-key, it is quite powerful when using lookups of the same key repeatedly and it'd be good to preserve it
I could see a few ways to make this work:
1) a feature flag, probably the easiest but there is some chance of conflicts with other flags 2) a feature flag in halfbrown, same as above but it limits the interface change to the library for hashmaps that simd-json already uses 3) make the map type a generic - this will require more work and generally I like it but OTOH I'm not sure how well that would play out or how much work it would be 4) same as abovbe but inside halfbrown
Ah, given that you also maintain halfbrown
, maybe doing that upstream is nicest. Would a halfbrown based on IndexMap
be worth it?
For me a feature flag would be fine. So that would be option 2. :eyes:
perfect, sounds like we got a plan then :)!
Also #377 might be of interest for you, it keeps order as long as no mutations are done already (and it should "just work" with index map as well)
Great! Thanks for being receptive to this. :)
For #377, I need a little bit more context. :sweat_smile: What does it do? What is the semantic difference between a borrowed value and a tape?
the borrowed value is a fully mutable DOM equivalent to serde_json::Value; the tape value is a flat structire (a single vec) that represents the JSON document, since it has no nesting and doesn't require extra allocations, inserts, etc it is significantly faster.
the downside is that the tape is not easily mutable; the lazy value bridges the gab between the two :)
This python snippet wouldn't round trip if simd-json was used:
This is because
Object
uses a hashmap which doesn't preserve the order of the json string keys. This can be resolved by using I would like to consider using indexmap, which has excellent performance, AND guarantees index order.This issue was discovered in: https://github.com/pola-rs/polars/issues/17439
If this is accepted, I am willing to make a PR.