Open estin opened 6 years ago
Thanks for the report. Are you able to verify that this makes a performance difference?
Looking at the code, I would be surprised if it had any noticeable impact. request::decode
just calls into httparse
. Nothing that I see should have any real cost if the buffer is empty.
That said, I would also be fine merging this as it could be clearer.
Are you able to verify that this makes a performance difference?
Nope... I can't confirm at 100%
Take a look at my simplest research
Firstly cargo bench
resutls of this, in lib.rs:
fn decode_with_check(buf: &mut BytesMut) -> io::Result<Option<Request>> {
// don't try parse empty buf
if buf.is_empty() {
return Ok(None)
}
request::decode(buf)
}
fn decode_without_check(buf: &mut BytesMut) -> io::Result<Option<Request>> {
request::decode(buf)
}
#[cfg(test)]
mod tests {
use test::Bencher;
use bytes::BytesMut;
use super::{decode_without_check, decode_with_check};
#[bench]
fn bench_noop(b: &mut Bencher) {
b.iter(|| {
let bytes = BytesMut::new();
if bytes.is_empty() {
return;
}
});
}
#[bench]
fn bench_decode_empty_with_check(b: &mut Bencher) {
b.iter(|| {
let mut bytes = BytesMut::new();
decode_with_check(&mut bytes).unwrap();
});
}
#[bench]
fn bench_decode_empty_without_check(b: &mut Bencher) {
b.iter(|| {
let mut bytes = BytesMut::new();
decode_without_check(&mut bytes).unwrap();
});
}
}
$ rustc --version
rustc 1.26.0-nightly (3eeb5a665 2018-03-01)
$ cargo bench
Finished release [optimized] target(s) in 0.0 secs
Running target/release/deps/tokio_minihttp-2a03b34773a2fb49
running 3 tests
test tests::bench_decode_empty_with_check ... bench: 12 ns/iter (+/- 0)
test tests::bench_decode_empty_without_check ... bench: 30 ns/iter (+/- 0)
test tests::bench_noop ... bench: 2 ns/iter (+/- 0)
Check on empty buf makes less about ~15 ns per iter.
Secondary wrk
benchmarks:
$ cargo run --release --example techempower
without check empty buf
$ wrk -t20 -c100 -d10s http://127.0.0.1:8080/plaintext
Running 10s test @ http://127.0.0.1:8080/plaintext
20 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 0.98ms 2.20ms 65.31ms 96.35%
Req/Sec 7.28k 2.06k 76.07k 88.91%
1448822 requests in 10.10s, 178.24MB read
Requests/sec: 143451.05
Transfer/sec: 17.65MB
$ wrk -t20 -c100 -d10s http://127.0.0.1:8080/plaintext
Running 10s test @ http://127.0.0.1:8080/plaintext
20 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 1.06ms 2.03ms 42.68ms 94.02%
Req/Sec 7.62k 1.62k 23.07k 76.12%
1522602 requests in 10.09s, 187.32MB read
Requests/sec: 150913.54
Transfer/sec: 18.57MB
$ wrk -t20 -c100 -d10s http://127.0.0.1:8080/plaintext
Running 10s test @ http://127.0.0.1:8080/plaintext
20 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 1.03ms 2.81ms 95.93ms 96.54%
Req/Sec 7.54k 2.67k 58.54k 81.41%
1500969 requests in 10.10s, 184.66MB read
Requests/sec: 148620.06
Transfer/sec: 18.28MB
with check empty buf
$ wrk -t20 -c100 -d10s http://127.0.0.1:8080/plaintext
Running 10s test @ http://127.0.0.1:8080/plaintext
20 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 774.95us 0.97ms 31.04ms 99.15%
Req/Sec 6.91k 0.99k 11.65k 75.61%
1387143 requests in 10.10s, 170.65MB read
Requests/sec: 137343.71
Transfer/sec: 16.90MB
$ wrk -t20 -c100 -d10s http://127.0.0.1:8080/plaintext
Running 10s test @ http://127.0.0.1:8080/plaintext
20 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 0.85ms 1.20ms 56.13ms 91.79%
Req/Sec 7.65k 3.06k 35.79k 78.32%
1527481 requests in 10.10s, 187.92MB read
Requests/sec: 151248.38
Transfer/sec: 18.61MB
$ wrk -t20 -c100 -d10s http://127.0.0.1:8080/plaintext
Running 10s test @ http://127.0.0.1:8080/plaintext
20 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 2.00ms 3.35ms 46.93ms 85.81%
Req/Sec 7.95k 1.97k 34.97k 81.22%
1583289 requests in 10.10s, 194.78MB read
Requests/sec: 156760.36
Transfer/sec: 19.29MB
Conclusion
cargo bench
shows that httparse
have some overhead on empty buf decoding, may be it not-inlined function calls or something else, I was not dig deeper, because have not enough knowledgewrk
benchmarks not informative, and cannot be proofs that PR make any improves. It is too hard make ideal benchmark runs for meThanks for the research. I think this can still be merged, but it looks like AppVeyor is hung 👻.
I'll try to unblock it.
Decoder recieve empty buf twice after first complete buf.
Why decoder recieve empty buf twice? It is issue for tokio-io?
I found it like that
Run
hello-world
On simplest GET request
Any way this PR skip parsing of empty buf as request, that must improve tokio-minihttp performance.
Sorry about my poor English.