This adds an order kwarg to all encoders for configuring how unordered
collections/objects are encoded. Options are:
None: the default. All objects are encoded in the most efficient
manner corresponding to their in-memory representation.
'deterministic': Unordered collections (sets, dicts) are sorted
before encoding. This ensures a consistent output between runs, which
may be useful when comparing/hashing the encoded binary
representation.
'sorted': same as 'deterministic', but all objet-like objects
will have their fields encoded in alphabetical order by name. This is
more expensive than 'deterministic', but may be useful for making
the output more human readable.
The 'deterministic' output has been heavily optimized - given the work
required to accomplish this feature, I wouldn't expect we can speed up
this operation much more. The 'sorted' option has not been fully
optimized (the assumption being a human-readable output is rarely perf
sensitive). If needed, there are some rather simple optimizations we can
add here to speed this up further.
In general, msgspec.json.encode(obj, order="deterministic") should be
as fast or faster than orjson.dumps(obj, option=orjson.OPT_SORT_KEYS).
For common small object sizes we average a ~20% speedup over orjson
for key sorting.
In [1]: import msgspec, orjson, random
In [2]: enc = msgspec.json.Encoder(order="deterministic")
In [3]: keys = [f'field_{i}' for i in range(6)]
In [4]: random.shuffle(keys)
In [5]: msg = dict(zip(keys, range(len(keys))))
In [6]: %timeit enc.encode(msg)
305 ns ± 2.99 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
In [7]: %timeit orjson.dumps(msg, option=orjson.OPT_SORT_KEYS)
377 ns ± 2.04 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Add
order
kwarg to encodersThis adds an
order
kwarg to all encoders for configuring how unordered collections/objects are encoded. Options are:None
: the default. All objects are encoded in the most efficient manner corresponding to their in-memory representation.'deterministic'
: Unordered collections (sets, dicts) are sorted before encoding. This ensures a consistent output between runs, which may be useful when comparing/hashing the encoded binary representation.'sorted'
: same as'deterministic'
, but all objet-like objects will have their fields encoded in alphabetical order by name. This is more expensive than'deterministic'
, but may be useful for making the output more human readable.The
'deterministic'
output has been heavily optimized - given the work required to accomplish this feature, I wouldn't expect we can speed up this operation much more. The'sorted'
option has not been fully optimized (the assumption being a human-readable output is rarely perf sensitive). If needed, there are some rather simple optimizations we can add here to speed this up further.In general,
msgspec.json.encode(obj, order="deterministic")
should be as fast or faster thanorjson.dumps(obj, option=orjson.OPT_SORT_KEYS)
. For common small object sizes we average a ~20% speedup overorjson
for key sorting.Fixes #609.