Closed lucab closed 4 months ago
After looking a bit into this request-parsing logic, it seems to me that the underlying problem is that the httparse
crate does not make use of memory sharing through the bytes
crate. It returns a fully parsed &str
instead. So this codepath is not constructing an Uri
from a shared Bytes
, but doing a full memory copy instead.
I suspect this may require porting httparse
to use the bytes
crate for sharing the existing memory content.
Ah, this was change actually was a performance improvement back when I did it: Bytes
back then had an inline variant, and fitting /
into that variant was faster than an atomic clone. Of course, Bytes
no longer has that variant.
Changing it back isn't impossible, it'd be similar to how the headers are cloned: the &str
can be turned into indices into the original buffer, and then we can slice from the Bytes
.
Would also be easy-ish to see if the pipeline
benchmarks improve with the change.
@seanmonstar thanks for the hint, I didn't notice there was already some &str -> Bytes
dance. I've followed the same approach for the path URI, PR at https://github.com/hyperium/hyper/pull/3575.
There are both a pipeline/hello_world_16
macro-benchmark and a bench_parse_incoming
micro-benchmark related to this logic.
I run several master-vs-PR benches and the numbers are comparable, here are the runs with the smallest variation intervals observed on my workstation:
# master
test hello_world_16 ... bench: 12,762 ns/iter (+/- 1,526) = 155 MB/s
test proto::h1::role::tests::bench_parse_incoming ... bench: 1,558 ns/iter (+/- 56) = 563 MB/s
# PR-3575
test hello_world_16 ... bench: 12,733 ns/iter (+/- 1,606) = 155 MB/s
test proto::h1::role::tests::bench_parse_incoming ... bench: 1,546 ns/iter (+/- 9) = 567 MB/s
Version
hyper-1.1.0
Platform Linux x86_64
Description
I was looking at the memory allocation patterns of a
hyper-1.0
HTTP1 server with afastwebsocket
endpoint (basically equivalent to this example) and I noticed that there is a spurious memory copy-allocation in theuri.parse()?
URI parsing logic here: https://github.com/hyperium/hyper/blob/00a703a9ef268266f8a8f78540253cbb2dcc6a55/src/proto/h1/role.rs#L157-L168This is visible in memory profiles as a
Bytes::copy_from_slice()
coming from aUri::from_str()
:I suspect this is the root-cause of some performance impact that has been noticed in the CPU profiles at https://github.com/hyperium/hyper/issues/3258#issuecomment-1623809448.