Closed dnaka91 closed 1 year ago
This exposes quite an interesting challenge in measuring runtime of a network bound program. If you check the hyperfine results for the zig http client you can see the wall clock time taken is ~160x longer than the time actually spent in the zig program (user time).
I did a bit of digging and for some reason the read(3) syscall for the body of the request is consistently taking upwards of 40-50ms but only for the zig program despite the fact that this occurs in none of the other programs and the server is writing to the socket in a timely manner.
Update: after disabling nagle's algorithm on the server the results are sane: | Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
---|---|---|---|---|---|
zig-http-client |
13.3 ± 0.4 | 12.7 | 16.7 | 1.00 | |
curl |
52.8 ± 0.5 | 52.0 | 54.6 | 3.98 ± 0.12 | |
rust-attohttpc |
13.5 ± 0.5 | 12.8 | 16.6 | 1.02 ± 0.05 | |
rust-hyper |
15.5 ± 1.8 | 13.9 | 26.1 | 1.17 ± 0.14 | |
rust-reqwest |
15.6 ± 0.7 | 14.8 | 19.4 | 1.18 ± 0.06 | |
rust-ureq |
13.7 ± 0.8 | 12.7 | 18.2 | 1.03 ± 0.06 | |
go-http-client |
23.3 ± 1.2 | 21.0 | 26.4 | 1.76 ± 0.11 | |
python-http-client |
181.6 ± 2.3 | 179.8 | 189.4 | 13.70 ± 0.43 |
Thank you @truemedian, that was the missing piece as I didn't look at the server implementation at all :sweat_smile:.
Now the results are much more reasonable for me as well
Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
---|---|---|---|---|
zig-http-client |
4.1 ± 0.5 | 3.3 | 6.1 | 1.00 |
curl |
12.9 ± 2.7 | 9.9 | 19.8 | 3.12 ± 0.76 |
rust-attohttpc |
6.4 ± 11.2 | 3.8 | 80.2 | 1.54 ± 2.73 |
rust-hyper |
7.0 ± 11.6 | 4.0 | 85.0 | 1.69 ± 2.82 |
rust-reqwest |
7.1 ± 10.1 | 4.5 | 86.2 | 1.71 ± 2.45 |
rust-ureq |
6.9 ± 12.2 | 3.8 | 78.8 | 1.66 ± 2.96 |
go-http-client |
10.2 ± 0.8 | 7.7 | 13.4 | 2.47 ± 0.38 |
python-http-client |
93.3 ± 0.9 | 92.1 | 95.7 | 22.62 ± 2.92 |
Just out of curiosity, I'll build a simple HTTP server with hyper
as well, to see whether there are any further big differences. So to not just test various clients against a Zig HTTP server, but the other way around as well, with a server written in Rust.
As the Zig client works properly now, I increased the loop count to 1000. Also, created a simple Rust server with hyper, as mentioned in the last message. Here are my results:
Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
---|---|---|---|---|
zig-http-client |
41.7 ± 3.0 | 39.0 | 59.6 | 1.00 |
curl |
93.8 ± 5.1 | 88.0 | 106.6 | 2.25 ± 0.20 |
rust-attohttpc |
64.3 ± 87.2 | 38.0 | 445.7 | 1.54 ± 2.09 |
rust-hyper |
74.7 ± 85.2 | 41.1 | 364.3 | 1.79 ± 2.05 |
rust-reqwest |
72.6 ± 72.2 | 45.3 | 363.8 | 1.74 ± 1.73 |
rust-ureq |
67.3 ± 77.4 | 38.8 | 398.3 | 1.61 ± 1.86 |
go-http-client |
97.5 ± 6.5 | 86.6 | 113.3 | 2.34 ± 0.23 |
python-http-client |
474.6 ± 4.7 | 467.8 | 482.0 | 11.38 ± 0.82 |
Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
---|---|---|---|---|
zig-http-client |
68.8 ± 36.3 | 55.8 | 226.7 | 4.41 ± 2.44 |
curl |
53.8 ± 0.5 | 52.9 | 55.1 | 3.45 ± 0.57 |
rust-attohttpc |
58.3 ± 114.7 | 23.2 | 606.7 | 3.74 ± 7.37 |
rust-hyper |
15.6 ± 2.6 | 10.0 | 19.9 | 1.00 |
rust-reqwest |
61.8 ± 87.4 | 31.2 | 464.2 | 3.96 ± 5.64 |
rust-ureq |
64.6 ± 110.8 | 20.4 | 410.7 | 4.14 ± 7.13 |
go-http-client |
21.9 ± 3.3 | 15.2 | 28.2 | 1.40 ± 0.31 |
python-http-client |
451.6 ± 5.4 | 444.8 | 463.4 | 28.94 ± 4.79 |
It's quite funny that, when using a Zip server the Zig client is fastest, and when using a Rust server the Rust client (with the same underlying library) is the fastest.
Something worth looking at is how each is handling keepalive, the zig client does keepalive by default (and the server is set up to enable that), but not everything is necessarily taking advantage of it
All clients tweaked now to use TCP nodelay and TCP as well as HTTP keep-alive. Where possible I set it on the connection and client config directly, but added the needed header as well, so the Zig server definitely sees it.
Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
---|---|---|---|---|
zig-http-client |
46.2 ± 7.5 | 40.2 | 63.6 | 1.79 ± 0.30 |
curl |
91.0 ± 5.2 | 87.4 | 114.1 | 3.52 ± 0.23 |
rust-attohttpc |
66.5 ± 76.1 | 39.0 | 400.2 | 2.57 ± 2.94 |
rust-hyper |
27.1 ± 2.2 | 24.1 | 32.8 | 1.05 ± 0.09 |
rust-reqwest |
29.0 ± 3.1 | 24.2 | 34.7 | 1.12 ± 0.13 |
rust-ureq |
25.9 ± 0.9 | 24.4 | 30.1 | 1.00 |
go-http-client |
42.6 ± 2.8 | 35.8 | 53.1 | 1.65 ± 0.12 |
python-http-client |
394.3 ± 15.2 | 375.3 | 420.1 | 15.25 ± 0.78 |
Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
---|---|---|---|---|
zig-http-client |
68.3 ± 30.8 | 57.2 | 214.3 | 5.97 ± 2.81 |
curl |
53.8 ± 0.9 | 52.2 | 56.3 | 4.70 ± 0.65 |
rust-attohttpc |
63.2 ± 126.3 | 23.6 | 640.9 | 5.52 ± 11.06 |
rust-hyper |
15.6 ± 3.6 | 10.2 | 25.3 | 1.36 ± 0.36 |
rust-reqwest |
17.1 ± 3.4 | 11.3 | 27.5 | 1.49 ± 0.36 |
rust-ureq |
11.4 ± 1.6 | 9.5 | 14.8 | 1.00 |
go-http-client |
23.8 ± 3.4 | 14.8 | 30.6 | 2.08 ± 0.41 |
python-http-client |
367.0 ± 4.3 | 362.2 | 377.2 | 32.07 ± 4.41 |
Thanks you both @truemedian @dnaka91 for taking time to look into this and making the benchmarks more fair! I fully understand the reasoning behind your comments and those improvements make total sense to me. Will get to #3 soon and add a review!
Really appreciate the input from @truemedian. That helped me a lot to do further tweaks on my adjustments for this benchmark.
I feel my current implementation of #3 still needs a few adjustments. Currently, there are some differences in how the requests and URLs are constructed. Some implementations re-use the instances from the previous loop iteration, some rebuild them every time.
Probably doesn't make much of a difference, but will move things a bit to make it at least consistent.
I think this is a nice attempt at comparing different HTTP clients in different languages, but suffers from unfairness in regard to how those are invoked.
The problem is, that hyperfine repeatedly runs the binary from scratch, which puts Zig in a better light due to additional overhead in other languages. So most of the timings are not really timing the HTTP library, but instead the surrounding things:
Improving the setup
To mitigate some of the pitfalls in the other languages and making the comparison more fair, there are two attempts that immediately come to mind:
The first attempt actually came to mind after I implemented the second :sweat_smile:.
I guess for most programs except
curl
it's pretty trivial. Get the timestamp before the request is sent, diff with the current timestamp after the request body was printed out, then log the duration.But this would still not be fair, as it wouldn't take any optimizations for subsequent calls into account. Something like internal buffers that might be re-used and don't need another allocation on the next request.
So I went ahead with the second attempt (as said before actually my first idea and went with that, then realized we might log the timing by the program itself, while I was writing this).
Pretty easy attempt actually, just let each program repeatedly send the request in a loop several times, so we can mitigate some of the overhead from spinning things up. Would not be absolutely zero, but if run often enough, it could water down the overhead to get more comparable results that show the actual HTTP library performance.
My modifications
Overall, I did 3 modifications, 2 of them being Rust-specific.
attohttpc
crate into the pool, as it's a somewhat popular alternative toureq
. Have used it in the past several times, and it very much resembles thereqwest
API, so I was curious how it performs.default-features = false
to most dependencies, and reducetokio
andhyper
features fromfull
to the absolute minimum. 2.2. Do the async runtime setup manually, which is probably not having any performance impact, but again can reduce the dependency count for this simple setup.100
requests. I'd have loved to set it higher, but the Zig HTTP client was surprisingly slow when run in a loop. 3.2. Forcurl
it was a bit tricky as a shell loop wouldn't be fair. But I found a trick online.My test results
So long story short, I ran the tests again after the mentioned adjustments and at least for me all the Rust programs were the fastest, closely followed by Go and cURL.
Most surprising was the slowdown in Zig. I'm not sure what's exactly the issue there. I'm not a Zig dev, and simply searched for how to do simple loops online. Probably I did something wrong there... or maybe it's really this slow and might improve a lot until the full 0.11 release :thinking:
I'll open a PR with my modifications shortly, so maybe you can check that I don't do anything wrong, that might slow it down so much.
zig-http-client
curl
rust-attohttpc
rust-hyper
rust-reqwest
rust-ureq
go-http-client
python-http-client