This PR reduces the overhead of shotover's transform/chain abstractions by passing Wrapper by reference instead of by value.
Passing by value is usually not a performance issue in rust.
However, in this case quite a few things collided to create a minor performance issue:
Wrapper is quite large, taking up 80 bytes on the stack. A large contributor to this is the local_addr: SocketAddr field which is 32 bytes.
`Transform::transform is an async function and async functions need to store and restore the state of all variables across await points. If these variables are larger, then it is more expensive to perform this store+restore.
Being part of the public API of shotover crate and also being part of a trait implementation that is called as a Box<dyn ..> trait object means its extremely unlikely that these function calls would be inlined.
This is a hot path since these functions are recalled for every message batch. The effect would be even greater for complicated chains with many transforms.
So to avoid this issue, this PR makes Transform::transform and other related functions take Wrapper by reference.
We can get most of the win by including only the largest fields in a referenced substruct as demonstrated by https://github.com/shotover/shotover-proxy/pull/1719
This would provide most of the benefits while avoiding some possible downsides:
having requests behind a reference might be slower in some rare cases e.g. we have to call .drain or std::mem::take() now.
Avoids introducing lifetimes for the vec of requests, ownership is easier to work with than references with named lifetimes.
Its hard to see if one way is truely better than the other, so I'm just taking this approach since its easier and probably has better performance
This PR yields consistent improvement across our basic chain benchmarks.
loopback and nullsink have a consistent improvement of 6% (0.3us) and 4% (0.5us) respectively.
The decode_request_metadata_drop benchmark improvement is noise, ignore it. Click through to the full codspeed performance report instead.
These are incredibly small improvements but still valuable since these microsecond costs will be paid many times a second under heavy load.
Additionally they will allow us to add further fields to the Wrapper type without performance cost.
For example this will enable an alternative solution to https://github.com/shotover/shotover-proxy/pull/1717 without a performance cost.
This PR reduces the overhead of shotover's transform/chain abstractions by passing
Wrapper
by reference instead of by value.Passing by value is usually not a performance issue in rust. However, in this case quite a few things collided to create a minor performance issue:
Wrapper
is quite large, taking up 80 bytes on the stack. A large contributor to this is thelocal_addr: SocketAddr
field which is 32 bytes.shotover
crate and also being part of a trait implementation that is called as aBox<dyn ..>
trait object means its extremely unlikely that these function calls would be inlined.So to avoid this issue, this PR makes
Transform::transform
and other related functions takeWrapper
by reference.We can get most of the win by including only the largest fields in a referenced substruct as demonstrated by https://github.com/shotover/shotover-proxy/pull/1719 This would provide most of the benefits while avoiding some possible downsides:
.drain
orstd::mem::take()
now.Its hard to see if one way is truely better than the other, so I'm just taking this approach since its easier and probably has better performance
This PR yields consistent improvement across our basic chain benchmarks.
loopback
andnullsink
have a consistent improvement of 6% (0.3us) and 4% (0.5us) respectively. Thedecode_request_metadata_drop
benchmark improvement is noise, ignore it. Click through to the full codspeed performance report instead. These are incredibly small improvements but still valuable since these microsecond costs will be paid many times a second under heavy load. Additionally they will allow us to add further fields to the Wrapper type without performance cost. For example this will enable an alternative solution to https://github.com/shotover/shotover-proxy/pull/1717 without a performance cost.