Add a performant way to (de-)serialize fixed-size byte arrays

hrxi commented 5 months ago

First observed in https://github.com/est31/serde-big-array/issues/19, there's no performant way to serialize fixed-size byte arrays with serde/postcard.

Add a FixedSizeByteArray type that can be used to serialize fixed-size byte arrays faster than all other available methods. This type only works with postcard. A more general solution would need to be implemented in serde itself.

This works around https://github.com/serde-rs/serde/issues/2680.

serialize16/own                 [3.8846 ns 3.8909 ns 3.8988 ns]
serialize16/bytes               [9.9192 ns 9.9503 ns 9.9907 ns]
serialize16/byte_array          [4.1374 ns 4.1461 ns 4.1564 ns]
serialize16/big_array           [68.245 ns 68.344 ns 68.454 ns]
serialize16/fixed_size          [9.9966 ns 10.030 ns 10.076 ns]
serialize16/variable_size       [15.696 ns 15.795 ns 15.898 ns]

serialize32/own                 [4.1744 ns 4.1857 ns 4.2014 ns]
serialize32/bytes               [9.6590 ns 9.6825 ns 9.7151 ns]
serialize32/byte_array          [4.5584 ns 4.5663 ns 4.5747 ns]
serialize32/big_array           [135.38 ns 135.78 ns 136.34 ns]
serialize32/fixed_size          [13.989 ns 14.034 ns 14.091 ns]
serialize32/variable_size       [21.223 ns 21.270 ns 21.328 ns]

deserialize16/own               [5.9018 ns 5.9143 ns 5.9272 ns]
deserialize16/bytes             [21.198 ns 21.278 ns 21.376 ns]
deserialize16/byte_array        [6.8560 ns 6.8719 ns 6.8934 ns]
deserialize16/big_array         [19.027 ns 19.098 ns 19.187 ns]
deserialize16/fixed_size        [7.9197 ns 7.9471 ns 7.9817 ns]
deserialize16/variable_size     [36.804 ns 36.871 ns 36.946 ns]

deserialize32/own               [8.8860 ns 8.8989 ns 8.9139 ns]
deserialize32/bytes             [21.241 ns 21.271 ns 21.321 ns]
deserialize32/byte_array        [11.441 ns 11.459 ns 11.481 ns]
deserialize32/big_array         [31.553 ns 31.618 ns 31.697 ns]
deserialize32/fixed_size        [17.334 ns 17.360 ns 17.386 ns]
deserialize32/variable_size     [59.705 ns 59.815 ns 59.949 ns]

own is the new FixedSizeByteArray, no length prefix.
bytes is serde_bytes::Bytes, it has a length prefix.
byte_array is serde_byte_array::ByteArray, it has a length prefix.
big_array is serde_big_array::Array, no length prefix.
fixed_size is [u8; _], no length prefix.
variable_size is [u8], it has a length prefix.

netlify[bot] commented 5 months ago

Deploy Preview for cute-starship-2d9c9b canceled.

Name	Link
Latest commit	b0a07fdbbc8adcf89a955a96446d3a73bcfa39e7
Latest deploy log	https://app.netlify.com/sites/cute-starship-2d9c9b/deploys/65bbc34b4cfe0d0008d490b9

hrxi commented 5 months ago

This was motivated by benchmarking the serialization of a Merkle tree structure that serializes lots of hashes represented as fixed size byte arrays. The serialization of that tree has seen a >2x performance increase using this code.

jamesmunns commented 5 months ago

Hi @hrxi, thanks for the PR! I'm sympathetic to this issue, but I'm not sure if I'm convinced this is a good solution for postcard.

This adds conditionals and panicking branches to every ser/de struct field path.

I think it should be possible to implement Serialize and Deserialize for FixedSizeByteArray without modifying the serialization and deserialization code of postcard itself, essentially serializing it "as if it was a tuple", like what other fixed size arrays already do today.

hrxi commented 5 months ago

I think it should be possible to implement Serialize and Deserialize for FixedSizeByteArray without modifying the serialization and deserialization code of postcard itself, essentially serializing it "as if it was a tuple"

Yes, that would be possible. It's what e.g. serde_big_array does. However, it leads to subpar performance, seen in the PR description. The motivation of this PR is precisely fixing this subpar performance. If there's another way to do that, I'd be more than happy to implement it. I wasn't able to come up with another strategy though.

This adds conditionals and panicking branches to every ser/de struct field path.

I'm not sure if this additional conditional/panic is a problem, I'd guess "no". Would you be interested in me figuring out what the impact of this is precisely? If it does have an impact, would you consider adding this FixedSizeByteArray as a crate feature?

hrxi commented 5 months ago

Hi @hrxi, thanks for the PR! I'm sympathetic to this issue, but I'm not sure if I'm convinced this is a good solution for postcard.

Thanks for taking a look at the PR and being quick to answer with your feelings. :)

jamesmunns commented 5 months ago

Would you be interested in me figuring out what the impact of this is precisely? If it does have an impact, would you consider adding this FixedSizeByteArray as a crate feature?

I am interested in knowing the difference! https://github.com/djkoloski/rust_serialization_benchmark is likely a good baseline, it should be possible to add a "patched postcard" to do a side-by-side with stock postcard.

I'm interested in poking around with what serde-big-array is doing. I wonder if you had a SPECIFIC type for bytes, rather than "any T" like serde-big-array, if the performance delta would be so great.

This isn't a hard no on merging this, it just doesn't sit great with me due to the special-cased nature of it. If it is really the only way and doesn't have other perf impact tradeoffs, then I'm happy to consider it further.

I'll plan to poke around the "what if you do the tuple thing but specifically for bytes where you don't need the whole drop-partially-init data" stuff that serde-big-array does, unless you beat me to it.

hrxi commented 5 months ago

I'll plan to poke around the "what if you do the tuple thing but specifically for bytes where you don't need the whole drop-partially-init data" stuff that serde-big-array does, unless you beat me to it.

Note that the serialization in serde-big-array doesn't do anything weird, it's just a simple for loop, but it's still slow.

jamesmunns commented 5 months ago

I see what you mean, but there's a fair bit of state in PartiallyInitialized to allow for dropping on partially intialized data, and I wonder if that + the unsafe pointer code is inhibiting optimization.

Feel free to ignore my gut until I (or you!) come up with data to prove it right/wrong.

hrxi commented 5 months ago

I see what you mean, but there's a fair bit of state in PartiallyInitialized to allow for dropping on partially intialized data, and I wonder if that + the unsafe pointer code is inhibiting optimization.

AFAICT, this is only in the Deserialize implementation. The Serialize implementation looks completely plain to me:

impl<'de, T, const N: usize> BigArray<'de, T> for [T; N] {
    fn serialize<S>(&self, serializer: S) -> result::Result<S::Ok, S::Error>
    where
        S: Serializer,
        T: Serialize,
    {
        let mut seq = serializer.serialize_tuple(self.len())?;
        for elem in &self[..] {
            seq.serialize_element(elem)?;
        }
        seq.end()
    }
    […]
}

I agree that the Deserialize implementation might have some overhead.

hrxi commented 5 months ago

I am interested in knowing the difference! https://github.com/djkoloski/rust_serialization_benchmark is likely a good baseline, it should be possible to add a "patched postcard" to do a side-by-side with stock postcard.

Not looking good from a first benchmark:

log/postcard/serialize    [317.97 µs 319.60 µs 321.70 µs]
log/postcard127/serialize [427.51 µs 429.38 µs 431.44 µs]
log/postcard/deserialize    [2.3656 ms 2.3784 ms 2.3933 ms]
log/postcard127/deserialize [2.3488 ms 2.3555 ms 2.3632 ms]

mesh/postcard/serialize    [669.90 µs 673.52 µs 678.01 µs]
mesh/postcard127/serialize [959.39 µs 962.73 µs 966.47 µs]
mesh/postcard/deserialize    [1.6938 ms 1.7085 ms 1.7258 ms]
mesh/postcard127/deserialize [1.3297 ms 1.3331 ms 1.3367 ms]

minecraft_savedata/postcard/serialize    [383.24 µs 386.63 µs 390.71 µs]
minecraft_savedata/postcard127/serialize [451.78 µs 453.60 µs 455.74 µs]
minecraft_savedata/postcard/deserialize    [2.0008 ms 2.0067 ms 2.0133 ms]
minecraft_savedata/postcard127/deserialize [1.9978 ms 2.0061 ms 2.0156 ms]

mk48/postcard/serialize    [1.7979 ms 1.8187 ms 1.8451 ms]
mk48/postcard127/serialize [1.8483 ms 1.8572 ms 1.8679 ms]
mk48/postcard/deserialize    [4.2330 ms 4.2469 ms 4.2622 ms]
mk48/postcard127/deserialize [4.2577 ms 4.2743 ms 4.2925 ms]

hrxi commented 5 months ago

Moving the serialization hack from serialize_struct to serialize_tuple_struct, the performance difference disappears, likely because tuple structs are rare to non-existent in Rust code. Just another hack though…

hrxi commented 5 months ago

Moving the serialization hack from serialize_struct to serialize_tuple_struct, the performance difference disappears, likely because tuple structs are rare to non-existent in Rust code. Just another hack though…

Would you be sympathetic to have this, or to have this as an opt-in feature?

jamesmunns commented 5 months ago

I'm not sure yet! I likely won't be able to poke around in this before the weekend.

hrxi commented 5 months ago

I'm interested in the general sentiment on this PR because depending on the result, I have to take different routes for the serialization in my project. Have you had time to look at it yet?

jamesmunns commented 5 months ago

Hey @hrxi, I haven't had a chance to prioritize this. I have a client deadline coming up soon, and will likely not have time to spend on this in the near future.

I'm interested in hearing about your exploration in this space, but I'd generally like to keep postcard as straightforward and predictable as possible, rather than optimizing any one specific use case.

If this blocks you - I'd suggest you fork postcard and use that for your project if this optimization is critical. As the wire format of postcard is stable, it should be straightforward to use postcard or your-optimized-postcard interchangeably as long as the wire format does not change.

hrxi commented 5 months ago

Thanks for the quick answer!

If this blocks you - I'd suggest you fork postcard and use that for your project if this optimization is critical. As the wire format of postcard is stable, it should be straightforward to use postcard or your-optimized-postcard interchangeably as long as the wire format does not change.

Yup, that's definitely an option I'll consider.

jamesmunns / postcard