Closed goodboy closed 3 years ago
Apologies for the delay here. Tuples aren't more performant than lists to create or use. If you read through the answers in the link above, you'd see that only constant tuples (e.g. (1, 2, 3)
) are "faster" since they're built only once by the compiler. Both lists and tuples have similar representations in cpython, and take equivalent time to construct dynamically. A quick benchmark using msgspec:
In [5]: data = list(range(1000))
In [6]: dec_list = msgspec.Decoder(list)
In [7]: dec_tuple = msgspec.Decoder(tuple)
In [8]: buf = msgspec.encode(data)
In [9]: %timeit dec_list.decode(buf)
12.9 µs ± 20.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [10]: %timeit dec_tuple.decode(buf)
12.9 µs ± 17.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
I'm not enthused about adding a use_list
-like option. Lists are the natural default type for MessagePack's array type. If you want to use a different type for arrays then you likely have a schema you're following and I'd direct you to use msgspec's support for typed serialization.
@jcrist learn something new every time I report something here 🏄🏼
You'd think i would have double checked the tuple
create speed claim 🙄
Is it possible the .encode()
step here is faster though?
Honestly, keeping it as is works for me as simpler is always better imo. I can close this is no one else is going to have quiffs.
and I'd direct you to use msgspec's support for typed serialization.
Yeah i think focusing on a struct schema is really the right way to designing things anyway 👍🏼
No problem, happy to help.
Is it possible the .encode() step here is faster though?
Both store their data as an array of PyObject*
, so I wouldn't expect a difference. Easy enough to benchmark though:
In [1]: import msgspec
In [2]: enc = msgspec.Encoder()
In [3]: msg_tuple = tuple(range(1000))
In [4]: msg_list = list(range(1000))
In [5]: %timeit enc.encode(msg_tuple)
10.3 µs ± 26.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [6]: %timeit enc.encode(msg_list)
10.3 µs ± 20.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
I can close this is no one else is going to have quiffs.
Closing!
msgpack-python
has an option:use_list=False
to its unpacker to allow for decoding totuple
by default.I noticed in the docs that tuples are only used for array types when used as hashable keys.
Is there a reason there isn't a way to either offer the
tuple
-as-default by a manual flag or, just by default decode to the same type considering they're ostensibly more performant in python thenlist
?