While measuring allocations in a performance-sensitive project, I found that the zero-copy Bytes trick in https://github.com/tokio-rs/prost/pull/449 actually increases the number of heap allocations and copies if neither the input buffer nor the field to be merged is Bytes.
Observations from my testing:
(1) If the input and output are both Bytes, then buf.copy_to_bytes() is zero-copy, and then <Bytes as BytesAdapter>::replace_with() is also zero-copy. This is the ideal case made possible by the above PR.
(2) If the input is not Bytes, then buf.copy_to_bytes() allocates a new intermediate Bytes and copies into it.
(3) If the output is not Bytes, then <Vec<u8> as BytesAdapter>::replace_with() will allocate a new vector and copy out of the intermediate Bytes.
The problem is that both (2) and (3) can easily occur in the same merge call. For example, prost-build generates Vec<u8> and String fields which trigger (3), and many users would attempt to parse from &[u8] or even a dynamic reader which is not contiguous, triggering (2).
I found the issue because I was following the general idiom of using &[u8] as my input type to keep the API simple and not expose dependencies, but measuring allocations and CPU time benchmarking both proved that this introduced substantial costs compared to Bytes.
I worry that many other users may also be triggering this case without knowing. Even if you had a way to explicitly tell users about it, I do not think it would be sufficient to require users to always use Bytes. If the other APIs in their program (especially externally-facing ones) constrain them to require other types, then this would just move the extra copies into their code instead of actually avoiding them where possible.
I believe it is easy to fix this for String because its merge implementation is neatly encapsulated in encoding::string::merge(). I am about to send a PR with a conservative fix for just this.
It would also be good to fix this for Vec<u8>, but I currently do not see a way to do that without changes to the external API of prost to make a distiction between merging from Bytes and merging from other types of buffers. As explicitly mentioned in earlier versions of encoding.rs, generic specialization would be a cleaner way to do this transparently, so it might make sense just to leave a comment and wait for that.
While measuring allocations in a performance-sensitive project, I found that the zero-copy
Bytes
trick in https://github.com/tokio-rs/prost/pull/449 actually increases the number of heap allocations and copies if neither the input buffer nor the field to be merged isBytes
.Observations from my testing: (1) If the input and output are both
Bytes
, thenbuf.copy_to_bytes()
is zero-copy, and then<Bytes as BytesAdapter>::replace_with()
is also zero-copy. This is the ideal case made possible by the above PR. (2) If the input is notBytes
, thenbuf.copy_to_bytes()
allocates a new intermediateBytes
and copies into it. (3) If the output is notBytes
, then<Vec<u8> as BytesAdapter>::replace_with()
will allocate a new vector and copy out of the intermediateBytes
.The problem is that both (2) and (3) can easily occur in the same merge call. For example,
prost-build
generatesVec<u8>
andString
fields which trigger (3), and many users would attempt to parse from&[u8]
or even a dynamic reader which is not contiguous, triggering (2).I found the issue because I was following the general idiom of using
&[u8]
as my input type to keep the API simple and not expose dependencies, but measuring allocations and CPU time benchmarking both proved that this introduced substantial costs compared toBytes
.I worry that many other users may also be triggering this case without knowing. Even if you had a way to explicitly tell users about it, I do not think it would be sufficient to require users to always use
Bytes
. If the other APIs in their program (especially externally-facing ones) constrain them to require other types, then this would just move the extra copies into their code instead of actually avoiding them where possible.I believe it is easy to fix this for
String
because its merge implementation is neatly encapsulated inencoding::string::merge().
I am about to send a PR with a conservative fix for just this.It would also be good to fix this for
Vec<u8>
, but I currently do not see a way to do that without changes to the external API ofprost
to make a distiction between merging fromBytes
and merging from other types of buffers. As explicitly mentioned in earlier versions ofencoding.rs
, generic specialization would be a cleaner way to do this transparently, so it might make sense just to leave a comment and wait for that.