Open hrxi opened 5 months ago
Name | Link |
---|---|
Latest commit | b0a07fdbbc8adcf89a955a96446d3a73bcfa39e7 |
Latest deploy log | https://app.netlify.com/sites/cute-starship-2d9c9b/deploys/65bbc34b4cfe0d0008d490b9 |
This was motivated by benchmarking the serialization of a Merkle tree structure that serializes lots of hashes represented as fixed size byte arrays. The serialization of that tree has seen a >2x performance increase using this code.
Hi @hrxi, thanks for the PR! I'm sympathetic to this issue, but I'm not sure if I'm convinced this is a good solution for postcard.
This adds conditionals and panicking branches to every ser/de struct field path.
I think it should be possible to implement Serialize
and Deserialize
for FixedSizeByteArray
without modifying the serialization and deserialization code of postcard itself, essentially serializing it "as if it was a tuple", like what other fixed size arrays already do today.
I think it should be possible to implement
Serialize
andDeserialize
forFixedSizeByteArray
without modifying the serialization and deserialization code of postcard itself, essentially serializing it "as if it was a tuple"
Yes, that would be possible. It's what e.g. serde_big_array
does. However, it leads to subpar performance, seen in the PR description. The motivation of this PR is precisely fixing this subpar performance. If there's another way to do that, I'd be more than happy to implement it. I wasn't able to come up with another strategy though.
This adds conditionals and panicking branches to every ser/de struct field path.
I'm not sure if this additional conditional/panic is a problem, I'd guess "no". Would you be interested in me figuring out what the impact of this is precisely? If it does have an impact, would you consider adding this FixedSizeByteArray
as a crate feature?
Hi @hrxi, thanks for the PR! I'm sympathetic to this issue, but I'm not sure if I'm convinced this is a good solution for postcard.
Thanks for taking a look at the PR and being quick to answer with your feelings. :)
Would you be interested in me figuring out what the impact of this is precisely? If it does have an impact, would you consider adding this FixedSizeByteArray as a crate feature?
I am interested in knowing the difference! https://github.com/djkoloski/rust_serialization_benchmark is likely a good baseline, it should be possible to add a "patched postcard" to do a side-by-side with stock postcard.
I'm interested in poking around with what serde-big-array is doing. I wonder if you had a SPECIFIC type for bytes, rather than "any T
" like serde-big-array, if the performance delta would be so great.
This isn't a hard no on merging this, it just doesn't sit great with me due to the special-cased nature of it. If it is really the only way and doesn't have other perf impact tradeoffs, then I'm happy to consider it further.
I'll plan to poke around the "what if you do the tuple thing but specifically for bytes where you don't need the whole drop-partially-init data" stuff that serde-big-array does, unless you beat me to it.
I'll plan to poke around the "what if you do the tuple thing but specifically for bytes where you don't need the whole drop-partially-init data" stuff that serde-big-array does, unless you beat me to it.
Note that the serialization in serde-big-array doesn't do anything weird, it's just a simple for loop, but it's still slow.
I see what you mean, but there's a fair bit of state in PartiallyInitialized
to allow for dropping on partially intialized data, and I wonder if that + the unsafe pointer code is inhibiting optimization.
Feel free to ignore my gut until I (or you!) come up with data to prove it right/wrong.
I see what you mean, but there's a fair bit of state in
PartiallyInitialized
to allow for dropping on partially intialized data, and I wonder if that + the unsafe pointer code is inhibiting optimization.
AFAICT, this is only in the Deserialize
implementation. The Serialize
implementation looks completely plain to me:
impl<'de, T, const N: usize> BigArray<'de, T> for [T; N] {
fn serialize<S>(&self, serializer: S) -> result::Result<S::Ok, S::Error>
where
S: Serializer,
T: Serialize,
{
let mut seq = serializer.serialize_tuple(self.len())?;
for elem in &self[..] {
seq.serialize_element(elem)?;
}
seq.end()
}
[…]
}
I agree that the Deserialize
implementation might have some overhead.
I am interested in knowing the difference! https://github.com/djkoloski/rust_serialization_benchmark is likely a good baseline, it should be possible to add a "patched postcard" to do a side-by-side with stock postcard.
Not looking good from a first benchmark:
log/postcard/serialize [317.97 µs 319.60 µs 321.70 µs]
log/postcard127/serialize [427.51 µs 429.38 µs 431.44 µs]
log/postcard/deserialize [2.3656 ms 2.3784 ms 2.3933 ms]
log/postcard127/deserialize [2.3488 ms 2.3555 ms 2.3632 ms]
mesh/postcard/serialize [669.90 µs 673.52 µs 678.01 µs]
mesh/postcard127/serialize [959.39 µs 962.73 µs 966.47 µs]
mesh/postcard/deserialize [1.6938 ms 1.7085 ms 1.7258 ms]
mesh/postcard127/deserialize [1.3297 ms 1.3331 ms 1.3367 ms]
minecraft_savedata/postcard/serialize [383.24 µs 386.63 µs 390.71 µs]
minecraft_savedata/postcard127/serialize [451.78 µs 453.60 µs 455.74 µs]
minecraft_savedata/postcard/deserialize [2.0008 ms 2.0067 ms 2.0133 ms]
minecraft_savedata/postcard127/deserialize [1.9978 ms 2.0061 ms 2.0156 ms]
mk48/postcard/serialize [1.7979 ms 1.8187 ms 1.8451 ms]
mk48/postcard127/serialize [1.8483 ms 1.8572 ms 1.8679 ms]
mk48/postcard/deserialize [4.2330 ms 4.2469 ms 4.2622 ms]
mk48/postcard127/deserialize [4.2577 ms 4.2743 ms 4.2925 ms]
Moving the serialization hack from serialize_struct
to serialize_tuple_struct
, the performance difference disappears, likely because tuple structs are rare to non-existent in Rust code. Just another hack though…
Moving the serialization hack from
serialize_struct
toserialize_tuple_struct
, the performance difference disappears, likely because tuple structs are rare to non-existent in Rust code. Just another hack though…
Would you be sympathetic to have this, or to have this as an opt-in feature?
I'm not sure yet! I likely won't be able to poke around in this before the weekend.
I'm interested in the general sentiment on this PR because depending on the result, I have to take different routes for the serialization in my project. Have you had time to look at it yet?
Hey @hrxi, I haven't had a chance to prioritize this. I have a client deadline coming up soon, and will likely not have time to spend on this in the near future.
I'm interested in hearing about your exploration in this space, but I'd generally like to keep postcard as straightforward and predictable as possible, rather than optimizing any one specific use case.
If this blocks you - I'd suggest you fork postcard and use that for your project if this optimization is critical. As the wire format of postcard is stable, it should be straightforward to use postcard
or your-optimized-postcard
interchangeably as long as the wire format does not change.
Thanks for the quick answer!
If this blocks you - I'd suggest you fork postcard and use that for your project if this optimization is critical. As the wire format of postcard is stable, it should be straightforward to use
postcard
oryour-optimized-postcard
interchangeably as long as the wire format does not change.
Yup, that's definitely an option I'll consider.
First observed in https://github.com/est31/serde-big-array/issues/19, there's no performant way to serialize fixed-size byte arrays with serde/postcard.
Add a
FixedSizeByteArray
type that can be used to serialize fixed-size byte arrays faster than all other available methods. This type only works with postcard. A more general solution would need to be implemented in serde itself.This works around https://github.com/serde-rs/serde/issues/2680.
own
is the newFixedSizeByteArray
, no length prefix.bytes
isserde_bytes::Bytes
, it has a length prefix.byte_array
isserde_byte_array::ByteArray
, it has a length prefix.big_array
isserde_big_array::Array
, no length prefix.fixed_size
is[u8; _]
, no length prefix.variable_size
is[u8]
, it has a length prefix.