orhun / zig-http-benchmarks

Benchmarking Zig HTTP client against Rust, Go, Python, C++ and curl
https://blog.orhun.dev/zig-bits-04/
MIT License
46 stars 3 forks source link

Non-fair comparison of HTTP clients #2

Closed dnaka91 closed 1 year ago

dnaka91 commented 1 year ago

I think this is a nice attempt at comparing different HTTP clients in different languages, but suffers from unfairness in regard to how those are invoked.

The problem is, that hyperfine repeatedly runs the binary from scratch, which puts Zig in a better light due to additional overhead in other languages. So most of the timings are not really timing the HTTP library, but instead the surrounding things:

Improving the setup

To mitigate some of the pitfalls in the other languages and making the comparison more fair, there are two attempts that immediately come to mind:

The first attempt actually came to mind after I implemented the second :sweat_smile:.

I guess for most programs except curl it's pretty trivial. Get the timestamp before the request is sent, diff with the current timestamp after the request body was printed out, then log the duration.

But this would still not be fair, as it wouldn't take any optimizations for subsequent calls into account. Something like internal buffers that might be re-used and don't need another allocation on the next request.

So I went ahead with the second attempt (as said before actually my first idea and went with that, then realized we might log the timing by the program itself, while I was writing this).

Pretty easy attempt actually, just let each program repeatedly send the request in a loop several times, so we can mitigate some of the overhead from spinning things up. Would not be absolutely zero, but if run often enough, it could water down the overhead to get more comparable results that show the actual HTTP library performance.


My modifications

Overall, I did 3 modifications, 2 of them being Rust-specific.

  1. I added the attohttpc crate into the pool, as it's a somewhat popular alternative to ureq. Have used it in the past several times, and it very much resembles the reqwest API, so I was curious how it performs.
  2. Many Rust programs use too many features, most of them unused. Probably most of them would be optimized away anyway, but it can't hurt to reduce the feature set. At least build times go down a bit. 2.1. Apply default-features = false to most dependencies, and reduce tokio and hyper features from full to the absolute minimum. 2.2. Do the async runtime setup manually, which is probably not having any performance impact, but again can reduce the dependency count for this simple setup.
  3. The most important change of adjusting each client to do repeated HTTP requests in a loop. 3.1. Set all loops to do 100 requests. I'd have loved to set it higher, but the Zig HTTP client was surprisingly slow when run in a loop. 3.2. For curl it was a bit tricky as a shell loop wouldn't be fair. But I found a trick online.

My test results

So long story short, I ran the tests again after the mentioned adjustments and at least for me all the Rust programs were the fastest, closely followed by Go and cURL.

Most surprising was the slowdown in Zig. I'm not sure what's exactly the issue there. I'm not a Zig dev, and simply searched for how to do simple loops online. Probably I did something wrong there... or maybe it's really this slow and might improve a lot until the full 0.11 release :thinking:

I'll open a PR with my modifications shortly, so maybe you can check that I don't do anything wrong, that might slow it down so much.

Command Mean [s] Min [s] Max [s] Relative
zig-http-client 4.065 ± 0.003 4.062 4.069 619.47 ± 942.33
curl 0.014 ± 0.003 0.010 0.021 2.12 ± 3.26
rust-attohttpc 0.007 ± 0.010 0.004 0.082 1.01 ± 2.19
rust-hyper 0.007 ± 0.011 0.004 0.092 1.05 ± 2.28
rust-reqwest 0.007 ± 0.008 0.005 0.081 1.09 ± 2.02
rust-ureq 0.007 ± 0.010 0.004 0.100 1.00
go-http-client 0.010 ± 0.005 0.007 0.094 1.58 ± 2.53
python-http-client 0.092 ± 0.002 0.090 0.099 13.98 ± 21.26
truemedian commented 1 year ago

This exposes quite an interesting challenge in measuring runtime of a network bound program. If you check the hyperfine results for the zig http client you can see the wall clock time taken is ~160x longer than the time actually spent in the zig program (user time).

I did a bit of digging and for some reason the read(3) syscall for the body of the request is consistently taking upwards of 40-50ms but only for the zig program despite the fact that this occurs in none of the other programs and the server is writing to the socket in a timely manner.

Update: after disabling nagle's algorithm on the server the results are sane: Command Mean [ms] Min [ms] Max [ms] Relative
zig-http-client 13.3 ± 0.4 12.7 16.7 1.00
curl 52.8 ± 0.5 52.0 54.6 3.98 ± 0.12
rust-attohttpc 13.5 ± 0.5 12.8 16.6 1.02 ± 0.05
rust-hyper 15.5 ± 1.8 13.9 26.1 1.17 ± 0.14
rust-reqwest 15.6 ± 0.7 14.8 19.4 1.18 ± 0.06
rust-ureq 13.7 ± 0.8 12.7 18.2 1.03 ± 0.06
go-http-client 23.3 ± 1.2 21.0 26.4 1.76 ± 0.11
python-http-client 181.6 ± 2.3 179.8 189.4 13.70 ± 0.43
dnaka91 commented 1 year ago

Thank you @truemedian, that was the missing piece as I didn't look at the server implementation at all :sweat_smile:.

Now the results are much more reasonable for me as well

Command Mean [ms] Min [ms] Max [ms] Relative
zig-http-client 4.1 ± 0.5 3.3 6.1 1.00
curl 12.9 ± 2.7 9.9 19.8 3.12 ± 0.76
rust-attohttpc 6.4 ± 11.2 3.8 80.2 1.54 ± 2.73
rust-hyper 7.0 ± 11.6 4.0 85.0 1.69 ± 2.82
rust-reqwest 7.1 ± 10.1 4.5 86.2 1.71 ± 2.45
rust-ureq 6.9 ± 12.2 3.8 78.8 1.66 ± 2.96
go-http-client 10.2 ± 0.8 7.7 13.4 2.47 ± 0.38
python-http-client 93.3 ± 0.9 92.1 95.7 22.62 ± 2.92

Just out of curiosity, I'll build a simple HTTP server with hyper as well, to see whether there are any further big differences. So to not just test various clients against a Zig HTTP server, but the other way around as well, with a server written in Rust.

dnaka91 commented 1 year ago

As the Zig client works properly now, I increased the loop count to 1000. Also, created a simple Rust server with hyper, as mentioned in the last message. Here are my results:

Zig server

Command Mean [ms] Min [ms] Max [ms] Relative
zig-http-client 41.7 ± 3.0 39.0 59.6 1.00
curl 93.8 ± 5.1 88.0 106.6 2.25 ± 0.20
rust-attohttpc 64.3 ± 87.2 38.0 445.7 1.54 ± 2.09
rust-hyper 74.7 ± 85.2 41.1 364.3 1.79 ± 2.05
rust-reqwest 72.6 ± 72.2 45.3 363.8 1.74 ± 1.73
rust-ureq 67.3 ± 77.4 38.8 398.3 1.61 ± 1.86
go-http-client 97.5 ± 6.5 86.6 113.3 2.34 ± 0.23
python-http-client 474.6 ± 4.7 467.8 482.0 11.38 ± 0.82

Rust server

Command Mean [ms] Min [ms] Max [ms] Relative
zig-http-client 68.8 ± 36.3 55.8 226.7 4.41 ± 2.44
curl 53.8 ± 0.5 52.9 55.1 3.45 ± 0.57
rust-attohttpc 58.3 ± 114.7 23.2 606.7 3.74 ± 7.37
rust-hyper 15.6 ± 2.6 10.0 19.9 1.00
rust-reqwest 61.8 ± 87.4 31.2 464.2 3.96 ± 5.64
rust-ureq 64.6 ± 110.8 20.4 410.7 4.14 ± 7.13
go-http-client 21.9 ± 3.3 15.2 28.2 1.40 ± 0.31
python-http-client 451.6 ± 5.4 444.8 463.4 28.94 ± 4.79

It's quite funny that, when using a Zip server the Zig client is fastest, and when using a Rust server the Rust client (with the same underlying library) is the fastest.

truemedian commented 1 year ago

Something worth looking at is how each is handling keepalive, the zig client does keepalive by default (and the server is set up to enable that), but not everything is necessarily taking advantage of it

dnaka91 commented 1 year ago

All clients tweaked now to use TCP nodelay and TCP as well as HTTP keep-alive. Where possible I set it on the connection and client config directly, but added the needed header as well, so the Zig server definitely sees it.

Zig server

Command Mean [ms] Min [ms] Max [ms] Relative
zig-http-client 46.2 ± 7.5 40.2 63.6 1.79 ± 0.30
curl 91.0 ± 5.2 87.4 114.1 3.52 ± 0.23
rust-attohttpc 66.5 ± 76.1 39.0 400.2 2.57 ± 2.94
rust-hyper 27.1 ± 2.2 24.1 32.8 1.05 ± 0.09
rust-reqwest 29.0 ± 3.1 24.2 34.7 1.12 ± 0.13
rust-ureq 25.9 ± 0.9 24.4 30.1 1.00
go-http-client 42.6 ± 2.8 35.8 53.1 1.65 ± 0.12
python-http-client 394.3 ± 15.2 375.3 420.1 15.25 ± 0.78

Rust server

Command Mean [ms] Min [ms] Max [ms] Relative
zig-http-client 68.3 ± 30.8 57.2 214.3 5.97 ± 2.81
curl 53.8 ± 0.9 52.2 56.3 4.70 ± 0.65
rust-attohttpc 63.2 ± 126.3 23.6 640.9 5.52 ± 11.06
rust-hyper 15.6 ± 3.6 10.2 25.3 1.36 ± 0.36
rust-reqwest 17.1 ± 3.4 11.3 27.5 1.49 ± 0.36
rust-ureq 11.4 ± 1.6 9.5 14.8 1.00
go-http-client 23.8 ± 3.4 14.8 30.6 2.08 ± 0.41
python-http-client 367.0 ± 4.3 362.2 377.2 32.07 ± 4.41
orhun commented 1 year ago

Thanks you both @truemedian @dnaka91 for taking time to look into this and making the benchmarks more fair! I fully understand the reasoning behind your comments and those improvements make total sense to me. Will get to #3 soon and add a review!

dnaka91 commented 1 year ago

Really appreciate the input from @truemedian. That helped me a lot to do further tweaks on my adjustments for this benchmark.

I feel my current implementation of #3 still needs a few adjustments. Currently, there are some differences in how the requests and URLs are constructed. Some implementations re-use the instances from the previous loop iteration, some rebuild them every time.

Probably doesn't make much of a difference, but will move things a bit to make it at least consistent.