Re-visiting the serialization

minghuaw / fe2o3-amqp

A rust implementation of the AMQP1.0 protocol based on serde and tokio.

MIT License

58 stars 7 forks source link

Re-visiting the serialization #214

Open minghuaw opened 10 months ago

minghuaw commented 10 months ago

The current to_vec() method creates the output buffer with Vec::new(), and according to [1]

A new, empty Vec created by the common means (vec![] or Vec::new or Vec::default) has a length and capacity of zero

This would inevitably get to re-allocation and probably repeated re-allocation if the object is large. However, given that we already have a SizeSerializer which can estimate the serialized size in bytes, this could potentially reduce the number of re-allocation.

[1] https://nnethercote.github.io/perf-book/heap-allocations.html?highlight=borrow

minghuaw commented 10 months ago

In addition, there are places where a temporary buffer is created during the serialization, is it possible to apply a similar technique? Or even better, can these temporary buffers be removed since the reason why most of them are there in the first place was because the serialized format requires a size byte(s) prepended to the actual data.

minghuaw commented 10 months ago

Or even better, can these temporary buffers be removed since the reason why most of them are there in the first place was because the serialized format requires a size byte(s) prepended to the actual data.

It might be better if this is introduced in a breaking update

minghuaw commented 10 months ago

Initial experiment shows that this quite significantly degrades serialization performance for primitive types like u8, bool, i8, and char that are only one or two bytes long. Very big improvement was observed for types that of of medium length (4B to 1kB). Surprisingly, for long strings/binary (>= 1MB), the performance seems to remain the same

minghuaw commented 10 months ago

Or even better, can these temporary buffers be removed since the reason why most of them are there in the first place was because the serialized format requires a size byte(s) prepended to the actual data.

Reserving capacity in buffer somehow negatively impact serializing Vec<u64>

lsunsi commented 4 months ago

This is interesting, I can't imagine why it would decrease the performance in this way. Just noting here that I'd expect the pre allocation to only improve performance as well.

minghuaw commented 4 months ago

This is interesting, I can't imagine why it would decrease the performance in this way. Just noting here that I'd expect the pre allocation to only improve performance as well.

That was my expectation as well. I haven't got enough time to investigate further however.