don't try parse empty buf

estin commented 6 years ago

Decoder recieve empty buf twice after first complete buf.

Why decoder recieve empty buf twice? It is issue for tokio-io?

I found it like that

diff --git a/src/lib.rs b/src/lib.rs
index fc36848..ca56d08 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -42,6 +42,7 @@ impl Decoder for HttpCodec {
     type Error = io::Error;
 
     fn decode(&mut self, buf: &mut BytesMut) -> io::Result<Option<Request>> {
+        println!("is buf empty - {}", buf.is_empty());
         request::decode(buf)
     }
 }

Run hello-world

$ cargo run --example hello-world
is buf empty - false
is buf empty - true
is buf empty - true

On simplest GET request

$ curl -X GET -v http://127.0.0.1:8080/ -v
Note: Unnecessary use of -X or --request, GET is already inferred.
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 8080 (#0)
> GET / HTTP/1.1
> Host: 127.0.0.1:8080
> User-Agent: curl/7.57.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Server: Example
< Content-Length: 13
< Date: Thu, 25 Jan 2018 23:01:55
<
* Connection #0 to host 127.0.0.1 left intact
Hello, world!

Any way this PR skip parsing of empty buf as request, that must improve tokio-minihttp performance.

Sorry about my poor English.

carllerche commented 6 years ago

Thanks for the report. Are you able to verify that this makes a performance difference?

Looking at the code, I would be surprised if it had any noticeable impact. request::decode just calls into httparse. Nothing that I see should have any real cost if the buffer is empty.

That said, I would also be fine merging this as it could be clearer.

estin commented 6 years ago

Are you able to verify that this makes a performance difference?

Nope... I can't confirm at 100%

Take a look at my simplest research

Firstly cargo bench resutls of this, in lib.rs:

fn decode_with_check(buf: &mut BytesMut) -> io::Result<Option<Request>> {
    // don't try parse empty buf
    if buf.is_empty() {
        return Ok(None)
    }
    request::decode(buf)
} 

fn decode_without_check(buf: &mut BytesMut) -> io::Result<Option<Request>> {
    request::decode(buf)
}  

#[cfg(test)]
mod tests {
    use test::Bencher;
    use bytes::BytesMut;
    use super::{decode_without_check, decode_with_check};
    
    #[bench]
    fn bench_noop(b: &mut Bencher) {
        b.iter(|| {
            let bytes = BytesMut::new();
            if bytes.is_empty() {
                return;
            }
        });
    }  

    #[bench]
    fn bench_decode_empty_with_check(b: &mut Bencher) {
        b.iter(|| {
            let mut bytes = BytesMut::new();
            decode_with_check(&mut bytes).unwrap();
        });
    } 
    
    #[bench]
    fn bench_decode_empty_without_check(b: &mut Bencher) {
        b.iter(|| {
            let mut bytes = BytesMut::new();
            decode_without_check(&mut bytes).unwrap();
        }); 
    }  
}

$ rustc --version
rustc 1.26.0-nightly (3eeb5a665 2018-03-01)

$ cargo bench
    Finished release [optimized] target(s) in 0.0 secs
     Running target/release/deps/tokio_minihttp-2a03b34773a2fb49
running 3 tests
test tests::bench_decode_empty_with_check    ... bench:          12 ns/iter (+/- 0)
test tests::bench_decode_empty_without_check ... bench:          30 ns/iter (+/- 0)
test tests::bench_noop                       ... bench:           2 ns/iter (+/- 0)

Check on empty buf makes less about ~15 ns per iter.

Secondary wrk benchmarks:

$ cargo run --release --example techempower

without check empty buf

$ wrk -t20 -c100 -d10s http://127.0.0.1:8080/plaintext
Running 10s test @ http://127.0.0.1:8080/plaintext
  20 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     0.98ms    2.20ms  65.31ms   96.35%
    Req/Sec     7.28k     2.06k   76.07k    88.91%
  1448822 requests in 10.10s, 178.24MB read
Requests/sec: 143451.05
Transfer/sec:     17.65MB

$ wrk -t20 -c100 -d10s http://127.0.0.1:8080/plaintext
Running 10s test @ http://127.0.0.1:8080/plaintext
  20 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.06ms    2.03ms  42.68ms   94.02%
    Req/Sec     7.62k     1.62k   23.07k    76.12%
  1522602 requests in 10.09s, 187.32MB read
Requests/sec: 150913.54
Transfer/sec:     18.57MB

$ wrk -t20 -c100 -d10s http://127.0.0.1:8080/plaintext
Running 10s test @ http://127.0.0.1:8080/plaintext
  20 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.03ms    2.81ms  95.93ms   96.54%
    Req/Sec     7.54k     2.67k   58.54k    81.41%
  1500969 requests in 10.10s, 184.66MB read
Requests/sec: 148620.06
Transfer/sec:     18.28MB

with check empty buf

$ wrk -t20 -c100 -d10s http://127.0.0.1:8080/plaintext
Running 10s test @ http://127.0.0.1:8080/plaintext
  20 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   774.95us    0.97ms  31.04ms   99.15%
    Req/Sec     6.91k     0.99k   11.65k    75.61%
  1387143 requests in 10.10s, 170.65MB read
Requests/sec: 137343.71
Transfer/sec:     16.90MB

$ wrk -t20 -c100 -d10s http://127.0.0.1:8080/plaintext
Running 10s test @ http://127.0.0.1:8080/plaintext
  20 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     0.85ms    1.20ms  56.13ms   91.79%
    Req/Sec     7.65k     3.06k   35.79k    78.32%
  1527481 requests in 10.10s, 187.92MB read
Requests/sec: 151248.38
Transfer/sec:     18.61MB

$ wrk -t20 -c100 -d10s http://127.0.0.1:8080/plaintext
Running 10s test @ http://127.0.0.1:8080/plaintext
  20 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     2.00ms    3.35ms  46.93ms   85.81%
    Req/Sec     7.95k     1.97k   34.97k    81.22%
  1583289 requests in 10.10s, 194.78MB read
Requests/sec: 156760.36
Transfer/sec:     19.29MB

Conclusion

cargo bench shows that httparse have some overhead on empty buf decoding, may be it not-inlined function calls or something else, I was not dig deeper, because have not enough knowledge
my wrk benchmarks not informative, and cannot be proofs that PR make any improves. It is too hard make ideal benchmark runs for me
but for me as python dev early and explicit exit on request decoding better then dive in to the next steps to decode it
feel free to close this PR without merging

carllerche commented 6 years ago

Thanks for the research. I think this can still be merged, but it looks like AppVeyor is hung 👻.

I'll try to unblock it.

tokio-rs / tokio-minihttp

don't try parse empty buf #31