Serialization of byte arrays is slow

I have a Merkle tree data structure that consists of lots of hashes. Hashes are fixed-size byte arrays. postcard serializes them (through the byte array → tuple of bytes) by simply writing them to the target. This could be very fast in theory.

In practice, as posted in https://github.com/est31/serde-big-array/issues/19, we see the following:

 serialize32/own           time:   [4.8208 ns 4.8477 ns 4.8765 ns]
 serialize32/bytes         time:   [12.389 ns 12.603 ns 12.870 ns]
(serialize32/big_array     time:   [142.25 ns 144.02 ns 146.59 ns])
 serialize32/fixed_size    time:   [15.273 ns 15.353 ns 15.449 ns]
 serialize32/variable_size time:   [134.84 ns 135.79 ns 136.98 ns]

own is basically just passing the byte array directly into the output, with no length prefix. bytes uses serde's serialize_bytes which includes a length prefix and then dumps the rest of the bytes directly. big_array uses serde-big-array, and is irrelevant for this issue. fixed_size uses serde's impl Serialize for [u8; 32], no length prefix. variable_size uses serde's impl Serialize for [u8], which includes a length prefix.

As you can see, by using serde, I'd be leaving a lot of performance on the table. Dumping the input array directly into the target costs 5 ns, and the best I can do with serde is 12 ns if I accept the wasted extra byte in storage, or 15 ns, if I do not. This leads to a measured >2x real-world performance degradation of the serialization of the tree structure that I have.

serde-rs / serde

Serialization of byte arrays is slow #2680