jcrist / msgspec

A fast serialization and validation library, with builtin support for JSON, MessagePack, YAML, and TOML
https://jcristharif.com/msgspec/
BSD 3-Clause "New" or "Revised" License
2.01k stars 59 forks source link

Add `order` option to encoders #627

Closed jcrist closed 5 months ago

jcrist commented 6 months ago

Add order kwarg to encoders

This adds an order kwarg to all encoders for configuring how unordered collections/objects are encoded. Options are:

The 'deterministic' output has been heavily optimized - given the work required to accomplish this feature, I wouldn't expect we can speed up this operation much more. The 'sorted' option has not been fully optimized (the assumption being a human-readable output is rarely perf sensitive). If needed, there are some rather simple optimizations we can add here to speed this up further.

In general, msgspec.json.encode(obj, order="deterministic") should be as fast or faster than orjson.dumps(obj, option=orjson.OPT_SORT_KEYS). For common small object sizes we average a ~20% speedup over orjson for key sorting.

In [1]: import msgspec, orjson, random

In [2]: enc = msgspec.json.Encoder(order="deterministic")

In [3]: keys = [f'field_{i}' for i in range(6)]

In [4]: random.shuffle(keys)

In [5]: msg = dict(zip(keys, range(len(keys))))

In [6]: %timeit enc.encode(msg)
305 ns ± 2.99 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [7]: %timeit orjson.dumps(msg, option=orjson.OPT_SORT_KEYS)
377 ns ± 2.04 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

Fixes #609.