Closed waghanza closed 3 years ago
I'm adding SO_REUSEPORT
for ruby's framework
https://github.com/puma/puma/pull/1712
Jester through httpbeast uses SO_REUSEPORT implicitly as long as threads are enabled via --threads:on
.
Agoo-C uses SO_REUSEPORT
as well (src/agoo/bind.c:236
)
@johngcn @kataras are go
based frameworks running with SO_REUSEPORT
?
Not by-default, but they can, examples:
GF's using net/http
as its underlying HTTP server currently, and it's a pity that net/http
just supports SO_REUSEADDR
instead of SO_REUSEPORT
.
@johngcn I thought SO_REUSEPORT
was included in stdlib
https://go-review.googlesource.com/c/go/+/37039/
Thanks @kataras for the tip. I think it's better to compare framework here having this feature enable in all impl.
@waghanza Thank you for the tips. I see, there're some workarounds for implementing SO_REUSEPORT
in golang using syscall
, in some ARCHs. I'll do some tests in GF using SO_REUSEPORT
.
@ioquatix does falcon
use SO_REUSEPORT
@waghanza It does now! Use v0.22.0.
falcon serve --reuse-port
- it's not the default for obvious reasons.
BTW, do you mind explaining what this is for?
I actually used to use SO_REUSEPORT but I found that on Darwin it doesn't work as expected - the OS always uses the first process to bind to the port (but it's not an error to start other processes). The better model is to bind a socket and share it over multiple threads/processes.
The problem with SO_REUSEPORT
is you might accidentally run multiple (different) app servers on the same port without realising it. I found this a bit annoying to be honest. I'd prefer the server bombs out with an error (can't bind).
@ioquatix sure, I know this is brake interoperability between OSes, but this project is to test a bunch of frameworks.
To have an equal / fair testing, it is recommended to have the same (at least max we can) behaviour for each implementations (having the same features).
To have an equal / fair testing, it is recommended to have the same (at least max we can) behaviour for each implementations (having the same features).
What are you trying to equalise?
Sure you should just let each server maximise processor and memory? How it does that should be up to the server?
For example, falcon can benchmark both --threaded
and --forked
. For JVM Ruby, you'd need to use --threaded
but for MRI you should use --forked
(this is what it does by default anyway).
What are you trying to equalise?
I think it's better to have SO_REUSEPORT
everywhere, or no where here ;-)
I'm trying to maximise efficiency (use the maximum of server capacity)
So, I agree with your basic idea: make everything equal.
But, unless I'm missing something important, I don't see how SO_REUSEPORT
achieves this.
Using --reuse-port
with falcon won't change performance I the slightest.
Whether you use that or not, it only binds one port, and then shares it over N processes/threads (N = processor core count).
The only difference is you can start multiple falcon processes bound to the same port, which won't change performance but would make it confusion as to which process will serve which request.
Can you explain what you think using SO_REUSEPORT
achieves? Are you planning to start N processes for single-process servers?
@ioquatix Using SO_REUSEPORT
can make a performance increase (and probably reduce resources usage)
https://github.com/puma/puma/pull/1712
As far a I understand, SO_REUSEPORT
make several process use the same port, so the kernel has not to make some syscall to attach a port to a process, so less syscalls, so less resources used, but @OvermindDL1 will have a better explanation that mine
@waghanza what you've said makes no sense at all. Sorry, it simply doesn't align up with what SO_REUSEPORT does.
If anything, SO_REUSEPORT
is a crappy way to achieve multi-process or multi-threaded servers. Maybe it's useful for rolling restarts. But it has nothing to do with improving performance of a well designed server (e.g. bind before fork or bind before threads).
Here is the evidence from my testing:
Without --reuse-port
:
Running falcon with 128 concurrent connections...
Running 2s test @ http://127.0.0.1:9292/small
8 threads and 128 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 2.66ms 2.90ms 38.84ms 92.00%
Req/Sec 7.59k 2.56k 35.61k 95.65%
121555 requests in 2.08s, 143.51MB read
Requests/sec: 58520.43
Transfer/sec: 69.09MB
With --reuse-port
:
Running falcon with 128 concurrent connections...
Running 2s test @ http://127.0.0.1:9292/small
8 threads and 128 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 2.53ms 2.40ms 41.16ms 93.89%
Req/Sec 7.37k 1.38k 21.34k 93.90%
120275 requests in 2.10s, 142.00MB read
Requests/sec: 57291.82
Transfer/sec: 67.64MB
@ioquatix I can see a slight performance increase it req/sec
. For latency
, I'm not sure wrk
is in fact the right tool ;-)
see https://github.com/the-benchmarker/web-frameworks/issues/670
The req/s actually dropped slightly and it's well within margin of error which would I'd say on any given run would be within 10%.
I agree latency computation is tricky. wrk
isn't too bad.
@ioquatix You get a point, you can suggest a tool to replace wrk
if you know one :stuck_out_tongue_winking_eye:
I reran my tests under some different benchmarks (big response, small response, tiny response). There was no obvious difference. That being said, if difference is small, it might not be obvious. Honestly, most web server that's capable of more than 50k req/s is fast enough in production. Franky speaking, from experience, as soon as you have any kind of database, it won't matter if you can serve 10k or 100k req/s, the overhead of the application server is so much bigger, the web server overhead becomes irrelevant.
Here is a benchmark tool I wrote, but it's limited a bit by the performance of Ruby. It's interesting though because it tries to find the limits of concurrency for a server within a % tolerance of performance degradation: https://github.com/socketry/benchmark-http
The way it works is it makes 1 request, and measures performance/latency.
Then it makes 2 request at the same time.
It does binary search to find the point at which your latency is impacted by a certain %. This is the point at which your server is slowing down multiple requests due to internal contention. It's an interesting metric because it even applies to slow servers over the internet - it's not about absolute latency but how latency is impacted as you increase concurrency.
Sometimes the fast server is most badly affected - it can handle 1 or two concurrent requests very very fast, but as you increase the number of concurrent requests the per-request latency increases sharply. It's a more interesting metric, because when you deal with real world servers on the internet, 1-2ms of latency is irrelevant, but concurrency is more interesting, i.e. how many concurrent requests can the sever handle before it degrades to 20% worse performance per request.
Also, don't know if you know this but:
wrk
is good for stressing read
/write
behaviour in a server since it uses persistent connections.ab
is good for stressing per-connection overheads since it uses connection: close
.benchmark-http
uses http/1 or http/2 and SSL, so it's really more of a full stack benchmark/testing tool.@ioquatix I was thinking of https://k6.io/
That tool looks awesome. That being said, in order for a benchmark tool to be accurate, it must be coded in a low level language IMHO. That's why I don't trust benchmark-http
for raw performance metrics. For other things (like concurrency), it's fine.
There is also perfer
. I wrote it to handle high performance C servers. https://github.com/ohler55/perfer
I think this article may be worth reading on various benchmark tools. https://blog.loadimpact.com/open-source-load-testing-tool-benchmarks-v2
@ohler55 making ad for your own product :stuck_out_tongue:
Of course. It works for me so maybe it will for others.
@proyb6 Fascinating article!
So wrk adds a touch of latency over apachebench, I didn't experience that here but that was years ago when I tried so it's likely been quite improved in apachebench. ^.^
I find it odd they recommend one of the others for scripting abilities when wrk's luajit handles all that fine as well.
Good to know wrk can still saturate a server best! ^.^
So if anything I'd use both apachebench and wrk, apachebench for testing times (both under light load, heavy load, and extreme load), and use wrk for testing load throughput, based on that benchmark page at least.
There is also perfer. I wrote it to handle high performance C servers. ohler55/perfer
Hmm, let's try it.
First it doesn't seem to support virtualhosts. Second it doesn't seem to support https (big big thing to test on high performance throughput as no server should be running without TLS nowadays).
So doing a quick test without https (which is fine for here since this web-frameworks thing doesn't test it even though it should I'd argue) on a 16 core server with nginx hosting a static page of "ok\n"
on /tester
:
$ # Curl's time
$ time curl http://127.0.0.1/tester --silent >/dev/null
real 0m0.012s
user 0m0.004s
sys 0m0.008s
$ # ab is single-threaded do note...
$ ab -c 200 -t 10 -n 5000000 -k http://127.0.0.1:80/tester
This is ApacheBench, Version 2.3 <$Revision: 1826891 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 127.0.0.1 (be patient)
Completed 500000 requests
Completed 1000000 requests
Finished 1309292 requests
Server Software: nginx/1.15.5
Server Hostname: 127.0.0.1
Server Port: 80
Document Path: /tester
Document Length: 3 bytes
Concurrency Level: 200
Time taken for tests: 10.001 seconds
Complete requests: 1309292
Failed requests: 0
Keep-Alive requests: 1296294
Total transferred: 226442694 bytes
HTML transferred: 3927879 bytes
Requests per second: 130911.29 [#/sec] (mean)
Time per request: 1.528 [ms] (mean)
Time per request: 0.008 [ms] (mean, across all concurrent requests)
Transfer rate: 22110.52 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.3 0 7
Processing: 0 2 0.8 1 54
Waiting: 0 1 0.7 1 53
Total: 0 2 0.8 1 54
WARNING: The median and mean for the processing time are not within a normal deviation
These results are probably not that reliable.
WARNING: The median and mean for the total time are not within a normal deviation
These results are probably not that reliable.
Percentage of the requests served within a certain time (ms)
50% 1
66% 1
75% 1
80% 1
90% 2
95% 3
98% 3
99% 4
100% 54 (longest request)
$ cd ../wrk && ./wrk -t 8 -c 200 -d 10 http://127.0.0.1:80/tester
Running 10s test @ http://127.0.0.1:80/tester
8 threads and 200 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 1.59ms 2.44ms 28.41ms 86.88%
Req/Sec 37.69k 6.43k 64.67k 68.25%
3004180 requests in 10.03s, 495.50MB read
Requests/sec: 299614.17
Transfer/sec: 49.42MB
$ cd ../perfer && ./bin/perfer 127.0.0.1:80 --path tester -t 8 -c 20 -k -d 10
*-*-* error sending request: Broken pipe
*-*-* error sending request: Broken pipe
*-*-* error sending request: Broken pipe
*-*-* error sending request: Broken pipe
*-*-* error sending request: Broken pipe
... spam of the above! Had to Ctrl+c a minute later when it didn't stop
I tried perfer
lots of different ways with lots of different options and I could never get it to work. Even in the most basic case:
$ cd ../perfer && ./bin/perfer 127.0.0.1:80 --path tester -t 1 -c 1 -k -d 1
127.0.0.1:80 did not respond to 15 requests.
Benchmarks for:
URL: 127.0.0.1:80/tester
Threads: 1
Connections/thread: 1
Duration: 1.0 seconds
Keep-Alive: true
Results:
Throughput: 100 requests/second
Latency: 0.313 +/-0.056 msecs (and stdev)
Only way I could get meaningful information is by disabling it's keepalive:
$ cd ../perfer && ./bin/perfer 127.0.0.1:80 --path tester -t 8 -c 200 -d 10
Benchmarks for:
URL: 127.0.0.1:80/tester
Threads: 8
Connections/thread: 200
Duration: 10.6 seconds
Keep-Alive: false
Results:
Throughput: 74493 requests/second
Latency: 10.862 +/-92.472 msecs (and stdev)
I did a number of other tests, it looks like on lower thread counts apachebench outperforms wrk (makes sense since ab is single-threaded) in throughput but wrk was better on latency, and on higher thread counts wrk blows apachebench away in throughput, over double that of ab at 15 threads+ with only a 1.71ms(wrk) compared to 1.50ms(ab).
And since benchmarking should be done on servers running in full multi-threaded multi-process mode, I'd still vote for wrk well over ab. As for perfer, unsure what is wrong with it... I'd like to give it an actual proper shakedown... Everything from ephemeral ports to timeouts to kernel TCP memory to etc... etc... all should be good to test significantly more connections than what I currently tested with (as I've tested with significantly more with both ab and wrk just now with no issue).
Hmm, I have an old 6 core server on a gigabit network (a bit loaded down so not going to get a gigabit) with the 16 core, it's a not terribly empty but should at least serve for a quick test to connect to the 16 core, git clone'ing, building, etc..., and running tests again:
╰─➤ time curl http://192.168.1.89/tester -s>/dev/null
curl http://192.168.1.89/tester -s > /dev/null 0.00s user 0.00s system 60% cpu 0.013 total
╰─➤ cd ../wrk && ./wrk -t 6 -c 200 -d 10 http://192.168.1.89:80/tester
Running 10s test @ http://192.168.1.89:80/tester
6 threads and 200 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 8.61ms 2.87ms 220.97ms 87.14%
Req/Sec 3.80k 444.87 12.44k 95.67%
227629 requests in 10.09s, 37.55MB read
Requests/sec: 22564.65
Transfer/sec: 3.72MB
╰─➤ ab -c 200 -t 10 -k -n 5000000 http://192.168.1.89:80/tester
This is ApacheBench, Version 2.3 <$Revision: 1706008 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 192.168.1.89 (be patient)
Finished 197286 requests
Server Software: nginx/1.15.5
Server Hostname: 192.168.1.89
Server Port: 80
Document Path: /tester
Document Length: 3 bytes
Concurrency Level: 200
Time taken for tests: 10.005 seconds
Complete requests: 197286
Failed requests: 0
Keep-Alive requests: 195455
Total transferred: 34121323 bytes
HTML transferred: 591858 bytes
Requests per second: 19719.47 [#/sec] (mean)
Time per request: 10.142 [ms] (mean)
Time per request: 0.051 [ms] (mean, across all concurrent requests)
Transfer rate: 3330.62 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 6.9 0 1010
Processing: 2 10 3.4 10 229
Waiting: 2 10 3.4 10 228
Total: 2 10 7.7 10 1021
Percentage of the requests served within a certain time (ms)
50% 10
66% 11
75% 11
80% 12
90% 13
95% 14
98% 15
99% 17
100% 1021 (longest request)
╰─➤ cd ../perfer && ./bin/perfer 192.168.1.89:80 --path tester -t 15 -c 200 -d 10 -k
*-*-* error sending request: Broken pipe
*-*-* error sending request: Broken pipe
*-*-* error sending request: Broken pipe
*-*-* error sending request: Broken pipe
*-*-* error sending request: Broken pipe
... etc...
And this server I used to do my heavy testing on so I know it can handle it. Well, let's try it localhost on this thing (I've managed to hit almost 2 million concurrent connections on this 6-core thing fun enough, though not this time obviously as the connections are less concurrent) with nginx and all:
╰─➤ time curl http://127.0.0.1:8080/tester -s>/dev/null
curl http://127.0.0.1:8080/tester -s > /dev/null 0.00s user 0.00s system 65% cpu 0.012 total
╰─➤ cd ../wrk && ./wrk -t 6 -c 200 -d 10 http://127.0.0.1:8080/tester
Running 10s test @ http://127.0.0.1:8080/tester
6 threads and 200 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 1.41ms 3.04ms 87.36ms 96.37%
Req/Sec 28.03k 3.49k 40.47k 74.67%
1678229 requests in 10.08s, 276.80MB read
Requests/sec: 166456.82
Transfer/sec: 27.46MB
╰─➤ ab -c 200 -t 10 -k -n 5000000 http://127.0.0.1:8080/tester
This is ApacheBench, Version 2.3 <$Revision: 1706008 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 127.0.0.1 (be patient)
Completed 500000 requests
Finished 667517 requests
Server Software: nginx/1.10.3
Server Hostname: 127.0.0.1
Server Port: 8080
Document Path: /tester
Document Length: 3 bytes
Concurrency Level: 200
Time taken for tests: 10.003 seconds
Complete requests: 667517
Failed requests: 0
Keep-Alive requests: 660940
Total transferred: 115447724 bytes
HTML transferred: 2002554 bytes
Requests per second: 66734.28 [#/sec] (mean)
Time per request: 2.997 [ms] (mean)
Time per request: 0.015 [ms] (mean, across all concurrent requests)
Transfer rate: 11271.25 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.5 0 11
Processing: 0 3 1.8 3 133
Waiting: 0 3 1.7 3 133
Total: 0 3 1.9 3 135
Percentage of the requests served within a certain time (ms)
50% 3
66% 3
75% 3
80% 3
90% 3
95% 4
98% 6
99% 8
100% 135 (longest request)
╰─➤ cd ../perfer && ./bin/perfer 127.0.0.1:8080 --path tester -t 8 -c 200 -d 10 -k
*-*-* error sending request: Broken pipe
*-*-* error sending request: Broken pipe
*-*-* error sending request: Broken pipe
*-*-* error sending request: Broken pipe
*-*-* error sending request: Broken pipe
*-*-* error sending request: Broken pipe
*-*-* error sending request: Broken pipe
*-*-* error sending request: Broken pipe
... etc...
So it doesn't seem to be a configuration on the other server at fault then (wrk did better in every way over ab on this server as well interesting...)...
Since I have ruby installed on this 6-core computer, let's try that http-benchmark thing too:
╰─➤ benchmark-http concurrency http://127.0.0.1:8080/tester 1 ↵
I am going to benchmark http://127.0.0.1:8080/tester...
I am running 1 asynchronous tasks that will each make sequential requests...
0.12s: <Async::Task:0x80caf0 failed>
| NoMethodError: undefined method `sum' for [0.0003895489498972893, 0.00043974118307232857]:Array
| → /var/lib/gems/2.3.0/gems/benchmark-http-0.5.0/lib/benchmark/http/statistics.rb:75 in `average'
| /var/lib/gems/2.3.0/gems/benchmark-http-0.5.0/lib/benchmark/http/statistics.rb:86 in `variance'
| /var/lib/gems/2.3.0/gems/benchmark-http-0.5.0/lib/benchmark/http/statistics.rb:94 in `standard_deviation'
| /var/lib/gems/2.3.0/gems/benchmark-http-0.5.0/lib/benchmark/http/statistics.rb:100 in `standard_error'
| /var/lib/gems/2.3.0/gems/benchmark-http-0.5.0/lib/benchmark/http/statistics.rb:159 in `confident?'
| /var/lib/gems/2.3.0/gems/benchmark-http-0.5.0/lib/benchmark/http/statistics.rb:144 in `sample'
| /var/lib/gems/2.3.0/gems/benchmark-http-0.5.0/lib/benchmark/http/command/concurrency.rb:57 in `block (2 levels) in measure_performance'
| /var/lib/gems/2.3.0/gems/async-1.15.1/lib/async/task.rb:199 in `block in make_fiber'
/var/lib/gems/2.3.0/gems/benchmark-http-0.5.0/lib/benchmark/http/statistics.rb:75:in `average': undefined method `sum' for [0.0003895489498972893, 0.00043974118307232857]:Array (NoMethodError)
from /var/lib/gems/2.3.0/gems/benchmark-http-0.5.0/lib/benchmark/http/statistics.rb:86:in `variance'
from /var/lib/gems/2.3.0/gems/benchmark-http-0.5.0/lib/benchmark/http/statistics.rb:94:in `standard_deviation'
from /var/lib/gems/2.3.0/gems/benchmark-http-0.5.0/lib/benchmark/http/statistics.rb:100:in `standard_error'
from /var/lib/gems/2.3.0/gems/benchmark-http-0.5.0/lib/benchmark/http/statistics.rb:159:in `confident?'
from /var/lib/gems/2.3.0/gems/benchmark-http-0.5.0/lib/benchmark/http/statistics.rb:144:in `sample'
from /var/lib/gems/2.3.0/gems/benchmark-http-0.5.0/lib/benchmark/http/command/concurrency.rb:57:in `block (2 levels) in measure_performance'
from /var/lib/gems/2.3.0/gems/async-1.15.1/lib/async/task.rb:199:in `block in make_fiber'
Well, what fresh hell is this horror? o.O
╰─➤ ruby --version
ruby 2.3.1p112 (2016-04-26) [x86_64-linux-gnu]
Maybe this version of ruby is too old and the program doesn't give a decent error message telling that, let's update with asdf, now it's:
╰─➤ ruby --version
ruby 2.6.0p0 (2018-12-25 revision 66547) [x86_64-linux]
A month ago, seems recent enough, trying again:
╰─➤ benchmark-http concurrency http://127.0.0.1:8080/tester
I am going to benchmark http://127.0.0.1:8080/tester...
I am running 1 asynchronous tasks that will each make sequential requests...
I made 3037 requests in 1.9s. The per-request latency was 625.026µs. That's 1599.933452566841 asynchronous requests/second.
Variance: 0.119µs
Standard Deviation: 344.367µs
Standard Error: 6.248831942257431e-06
I am running 2 asynchronous tasks that will each make sequential requests...
I made 3632 requests in 1.2s. The per-request latency was 672.841µs. That's 1486.834535169551 asynchronous requests/second.
Variance: 0.164µs
Standard Deviation: 405.391µs
Standard Error: 6.726688103003247e-06
I am running 4 asynchronous tasks that will each make sequential requests...
I made 5913 requests in 2.2s. The per-request latency was 1.49ms. That's 1474.910966905804 asynchronous requests/second.
Variance: 1.316µs
Standard Deviation: 1.15ms
Standard Error: 1.4917858976199e-05
I am running 3 asynchronous tasks that will each make sequential requests...
I made 5893 requests in 2.6s. The per-request latency was 1.33ms. That's 1371.0889319941182 asynchronous requests/second.
Variance: 1.046µs
Standard Deviation: 1.02ms
Standard Error: 1.3321248721848672e-05
Your server can handle 2 concurrent requests.
At this level of concurrency, requests have ~1.08x higher latency.
Well... that's significantly false... Only 5893 requests in 2.6s, where wrk gets this with both 2 and 20 connections on just two threads, and just for good measure only 1 connection on 1 thread:
╰─➤ cd ../wrk && ./wrk -t 2 -c 2 -d 10 http://127.0.0.1:8080/tester
Running 10s test @ http://127.0.0.1:8080/tester
2 threads and 2 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 59.60us 224.36us 13.42ms 99.13%
Req/Sec 20.15k 2.49k 26.72k 62.38%
404824 requests in 10.10s, 66.77MB read
Requests/sec: 40082.74
Transfer/sec: 6.61MB
╰─➤ cd ../wrk && ./wrk -t 2 -c 20 -d 10 http://127.0.0.1:8080/tester
Running 10s test @ http://127.0.0.1:8080/tester
2 threads and 20 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 404.29us 1.39ms 35.53ms 94.97%
Req/Sec 77.51k 12.19k 87.50k 91.04%
1549508 requests in 10.10s, 255.57MB read
Requests/sec: 153415.13
Transfer/sec: 25.30MB
╰─➤ cd ../wrk && ./wrk -t 1 -c 1 -d 10 http://127.0.0.1:8080/tester
Running 10s test @ http://127.0.0.1:8080/tester
1 threads and 1 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 59.74us 81.40us 4.13ms 97.16%
Req/Sec 17.66k 1.91k 22.29k 65.35%
177349 requests in 10.10s, 29.25MB read
Requests/sec: 17559.47
Transfer/sec: 2.90MB
And of course nginx still isn't the fastest thing out but the fact it can be hit 17556 requests a second on a single connection on a single thread and http-benchmark can't even hit 2.5k in a single second with multiple connections is questionable... o.O
Wow, nice writeup. Looks like I have some digging to do to figure out what happened with perfer.
While I'm at it, let's grab and test k6 too, a quick test with it (20 concurrent connections like at the end of just above, keep alive enabled, etc... etc.., like ab
I don't see a way to control threads (ab is single-threaded anyway, but I think go scales across cores so this should already be maximally multi-threaded?)):
╰─➤ k6 run k6.js --duration 10s --vus 20
/\ |‾‾| /‾‾/ /‾/
/\ / \ | |_/ / / /
/ \/ \ | | / ‾‾\
/ \ | |‾\ \ | (_) |
/ __________ \ |__| \__\ \___/ .io
execution: local
output: -
script: k6.js
duration: 10s, iterations: -
vus: 20, max: 20
done [==========================================================] 10s / 10s
data_received..............: 45 MB 4.5 MB/s
data_sent..................: 22 MB 2.2 MB/s
http_req_blocked...........: avg=5.38µs min=1.11µs med=1.93µs max=9.43ms p(90)=2.81µs p(95)=3.31µs
http_req_connecting........: avg=2.02µs min=0s med=0s max=9.23ms p(90)=0s p(95)=0s
http_req_duration..........: avg=222.05µs min=47.39µs med=146.05µs max=19.85ms p(90)=406.82µs p(95)=592.62µs
http_req_receiving.........: avg=18.93µs min=6.47µs med=12.11µs max=14.26ms p(90)=21.54µs p(95)=28.56µs
http_req_sending...........: avg=16.87µs min=6.25µs med=9.2µs max=19.77ms p(90)=18.66µs p(95)=24.37µs
http_req_tls_handshaking...: avg=0s min=0s med=0s max=0s p(90)=0s p(95)=0s
http_req_waiting...........: avg=186.24µs min=28.53µs med=115.43µs max=12.1ms p(90)=362.62µs p(95)=523.89µs
http_reqs..................: 258409 25840.643325/s
iteration_duration.........: avg=319.61µs min=98.05µs med=225.71µs max=19.91ms p(90)=540.27µs p(95)=794.08µs
iterations.................: 258405 25840.243329/s
vus........................: 20 min=20 max=20
vus_max....................: 20 min=20 max=20
So initially I see it is hitting 25840 req/s where wrk was hitting 153415.13 req/s with the same 20 concurrent connections... Let's try higher:
╰─➤ k6 run k6.js --duration 10s --vus 200
/\ |‾‾| /‾‾/ /‾/
/\ / \ | |_/ / / /
/ \/ \ | | / ‾‾\
/ \ | |‾\ \ | (_) |
/ __________ \ |__| \__\ \___/ .io
execution: local
output: -
script: k6.js
duration: 10s, iterations: -
vus: 200, max: 200
done [==========================================================] 10s / 10s
data_received..............: 40 MB 4.0 MB/s
data_sent..................: 20 MB 2.0 MB/s
http_req_blocked...........: avg=15.29µs min=1.2µs med=2.31µs max=15.6ms p(90)=3.3µs p(95)=3.85µs
http_req_connecting........: avg=11.31µs min=0s med=0s max=15.1ms p(90)=0s p(95)=0s
http_req_duration..........: avg=1.38ms min=55.93µs med=771.24µs max=34.31ms p(90)=3.21ms p(95)=4.55ms
http_req_receiving.........: avg=23.27µs min=6.81µs med=11.25µs max=31.83ms p(90)=21.86µs p(95)=31.97µs
http_req_sending...........: avg=40.19µs min=6.74µs med=10.36µs max=32.8ms p(90)=20.91µs p(95)=29.24µs
http_req_tls_handshaking...: avg=0s min=0s med=0s max=0s p(90)=0s p(95)=0s
http_req_waiting...........: avg=1.31ms min=34.66µs med=725.4µs max=33.25ms p(90)=3.1ms p(95)=4.42ms
http_reqs..................: 233824 23378.624065/s
iteration_duration.........: avg=1.5ms min=112.08µs med=875.68µs max=40.49ms p(90)=3.4ms p(95)=4.8ms
iterations.................: 233821 23378.324113/s
vus........................: 200 min=200 max=200
vus_max....................: 200 min=200 max=200
Wow that took a long time to just load up the engines, guessing making new javascript interpreters is not too fast... >.> Still only about 23378 req/s again.
So far I'm still learning to wrk
?
Let's try wrk2
while I'm at it as well:
╰─➤ cd ../wrk2 && ./wrk -t 6 -c 20 -d 20 -R 999999 http://127.0.0.1:8080/tester 1 ↵
Running 20s test @ http://127.0.0.1:8080/tester
6 threads and 20 connections
Thread calibration: mean lat.: 4472.116ms, rate sampling interval: 15925ms
Thread calibration: mean lat.: 4414.885ms, rate sampling interval: 15802ms
Thread calibration: mean lat.: 4450.223ms, rate sampling interval: 15810ms
Thread calibration: mean lat.: 4418.796ms, rate sampling interval: 15843ms
Thread calibration: mean lat.: 4394.733ms, rate sampling interval: 15654ms
Thread calibration: mean lat.: 4511.955ms, rate sampling interval: 15908ms
Thread Stats Avg Stdev Max +/- Stdev
Latency 13.27s 2.49s 17.38s 58.47%
Req/Sec -nan -nan 0.00 0.00%
2670362 requests in 20.00s, 440.44MB read
Requests/sec: 133524.10
Transfer/sec: 22.02MB
And normal wrk
with the same settings:
╰─➤ cd ../wrk && ./wrk -t 6 -c 20 -d 20 http://127.0.0.1:8080/tester
Running 20s test @ http://127.0.0.1:8080/tester
6 threads and 20 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 406.73us 1.10ms 30.97ms 92.55%
Req/Sec 27.15k 4.04k 38.84k 68.08%
3243485 requests in 20.02s, 534.97MB read
Requests/sec: 162018.91
Transfer/sec: 26.72MB
I'm thinking wrk2 is a bit broken... How can it have multiple-second latency as average with 133524.10 req/s
... o.O
Tested vegeta as well, managed to get it up to 39250.47 req/sec
before it crumbled under the load (process froze for multiple minutes, had to kill the process).
Tested welle too:
╰─➤ ./target/release/welle -c 20 -n 500000 http://127.0.0.1:8080/tester
Total Requests: 500000
Concurrency Count: 20
Total Completed Requests: 500000
Total Errored Requests: 0
Total 5XX Requests: 0
Total Time Taken: 21.564045318s
Avg Time Taken: 43.128µs
Total Time In Flight: 248.762968408s
Avg Time In Flight: 497.525µs
Percentage of the requests served within a certain time:
50%: 672.936µs
66%: 730.705µs
75%: 779.781µs
80%: 820.002µs
90%: 977.444µs
95%: 1.168457ms
99%: 2.119692ms
100%: 47.040756ms
So it averaged ~23186.75 req/sec
, which is still well well below the ~133506.72 req/sec
from wrk with the same settings (or ~82639.49 req/sec
with only a single thread with wrk or ~66755.43 req/sec
with apachebench)...
@OvermindDL1 wrk
seems to be accurate for req/s
, but what for latency
?
@OvermindDL1 wrk seems to be accurate for req/s, but what for latency ?
Seems fairly accurate from the tests I did. Other tools give more useful information for lower concurrency amounts but nothing I tested today was able to get anywhere near wrk on higher concurrent counts without either dying off or exploding to very high times.
Some samples I just ran again here from the 6-core to the 16-core (to add additional latency for testing):
Wrk:
╰─➤ cd ../wrk && ./wrk -t 1 -c 20 -d 20 http://192.168.1.89:80/tester
Running 20s test @ http://192.168.1.89:80/tester
1 threads and 20 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 0.85ms 242.26us 5.04ms 74.15%
Req/Sec 22.70k 460.65 23.94k 71.64%
453954 requests in 20.10s, 74.87MB read
Requests/sec: 22584.78
Transfer/sec: 3.73MB
K6:
╰─➤ k6 run k6.js --duration 10s --vus 200
/\ |‾‾| /‾‾/ /‾/
/\ / \ | |_/ / / /
/ \/ \ | | / ‾‾\
/ \ | |‾\ \ | (_) |
/ __________ \ |__| \__\ \___/ .io
execution: local
output: -
script: k6.js
duration: 10s, iterations: -
vus: 200, max: 200
done [==========================================================] 10s / 10s
data_received..............: 32 MB 3.2 MB/s
data_sent..................: 16 MB 1.6 MB/s
http_req_blocked...........: avg=140.12µs min=1.21µs med=2.37µs max=1s p(90)=3.37µs p(95)=4.16µs
http_req_connecting........: avg=136.24µs min=0s med=0s max=1s p(90)=0s p(95)=0s
http_req_duration..........: avg=7.29ms min=225.33µs med=7.45ms max=231.34ms p(90)=10.48ms p(95)=11.38ms
http_req_receiving.........: avg=23.92µs min=8.22µs med=14.23µs max=18.19ms p(90)=27.55µs p(95)=42.9µs
http_req_sending...........: avg=29.6µs min=6.95µs med=11.77µs max=43.43ms p(90)=22.12µs p(95)=29.79µs
http_req_tls_handshaking...: avg=0s min=0s med=0s max=0s p(90)=0s p(95)=0s
http_req_waiting...........: avg=7.23ms min=169.64µs med=7.41ms max=231.27ms p(90)=10.43ms p(95)=11.32ms
http_reqs..................: 185679 18567.762706/s
iteration_duration.........: avg=7.54ms min=343.66µs med=7.57ms max=1.01s p(90)=10.72ms p(95)=11.69ms
iterations.................: 185679 18567.762706/s
vus........................: 200 min=200 max=200
vus_max....................: 200 min=200 max=200
ApacheBench:
╰─➤ ab -c 200 -t 20 -k -n 5000000 http://192.168.1.89:80/tester
This is ApacheBench, Version 2.3 <$Revision: 1706008 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 192.168.1.89 (be patient)
Finished 392775 requests
Server Software: nginx/1.15.5
Server Hostname: 192.168.1.89
Server Port: 80
Document Path: /tester
Document Length: 3 bytes
Concurrency Level: 200
Time taken for tests: 20.000 seconds
Complete requests: 392775
Failed requests: 0
Keep-Alive requests: 389001
Total transferred: 67931373 bytes
HTML transferred: 1178328 bytes
Requests per second: 19638.35 [#/sec] (mean)
Time per request: 10.184 [ms] (mean)
Time per request: 0.051 [ms] (mean, across all concurrent requests)
Transfer rate: 3316.89 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 10.3 0 1013
Processing: 1 10 4.0 10 231
Waiting: 1 10 4.0 10 231
Total: 2 10 11.1 10 1024
Percentage of the requests served within a certain time (ms)
50% 10
66% 11
75% 11
80% 12
90% 13
95% 14
98% 15
99% 17
100% 1024 (longest request)
Welle:
╰─➤ ./target/release/welle -c 20 -n 500000 http://192.168.1.89:80/tester
Total Requests: 500000
Concurrency Count: 20
Total Completed Requests: 500000
Total Errored Requests: 0
Total 5XX Requests: 0
Total Time Taken: 28.261249948s
Avg Time Taken: 56.522µs
Total Time In Flight: 442.984907521s
Avg Time In Flight: 885.969µs
Percentage of the requests served within a certain time:
50%: 1.2383ms
66%: 1.388187ms
75%: 1.495586ms
80%: 1.570883ms
90%: 1.806927ms
95%: 2.068901ms
99%: 2.874676ms
100%: 60.555964ms
Base ping:
╰─➤ ping 192.168.1.89 -c 2
PING 192.168.1.89 (192.168.1.89) 56(84) bytes of data.
64 bytes from 192.168.1.89: icmp_seq=1 ttl=64 time=0.496 ms
64 bytes from 192.168.1.89: icmp_seq=2 ttl=64 time=0.492 ms
--- 192.168.1.89 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.492/0.494/0.496/0.002 ms
Overall wrk was the most consistent and shortest, giving 0.85ms 242.26us 5.04ms 74.15%
, which is an average of .45ms above base ping, and k6 had an average of 7.54ms with just the http waiting part at 7.29ms average, which is significantly higher than what both wrk and ab show, and ab had 10.184ms for non-concurrent requests and 0.051ms for concurrent requests, and I'm not sure where it is getting that from (aggregated? if so then the actual would be the value times the concurrency amount thus 0.051*20 is 1.02ms, which is still not enough, maybe its taking into account the ping already?). Welle had 1.2383ms for its 50%, which is about double ping time.
ab
might be the most accurate at that level depending on what is being calculated, though wrk was the direct-listed fastest as well as pumped through the most requests.
Thanks for the detailed write up, there is a bug/some unfortunate upstream changes which are affecting benchmark-http
- I'll see if I can sort it out.
Okay, found the issue. But I have a school meeting with my kids. I will fix it later.
Okay, found the issue. But I have a school meeting with my kids. I will fix it later.
Awesome! Tell me when updated and what commands I need to run to update it locally (I'm not a ruby user so I'm unfamiliar with its ecosystem) and I'll test the 2 servers again. :-)
Okay, I released benchmark-http 0.6.0
and I also set minimum ruby version (2.4). However, for good performance, run it on Linux.
I also need to do some more perf comparisons with wrk
to see how it can be better. On my laptop wrk
was getting about 8000 req/s but benchmark-http
was around 5000.
@ioquatix a laptop generaly have not the same CPU (a server is mainly a Xeon)
Even on a laptop that is abysmal and probably not wrk's fault, but rather what is the server being tested?
If you are testing, say, a ruby server then it will be abysmal in general as that's just the nature of ruby, regardless of the tool you use. You need to test a fast server, and even nginx is not the fastest (but sufficient in testing here to see what can saturate it) but will be significantly faster than anything else 99.99%+ of people will ever use. So, testing... :-)
╰─➤ benchmark-http concurrency http://127.0.0.1:8080/tester
I am going to benchmark http://127.0.0.1:8080/tester...
I am running 1 asynchronous tasks that will each make sequential requests...
I made 3305 requests in 865.48ms. The per-request latency was 261.869µs. That's 3818.7074071129277 asynchronous requests/second.
Variance: 0.023µs
Standard Deviation: 150.522µs
Standard Error: 2.618263603678451e-06
I am running 2 asynchronous tasks that will each make sequential requests...
I made 7 requests in 828.371µs. The per-request latency was 236.677µs. That's 7266.325628609022 asynchronous requests/second.
Variance: 0.000µs
Standard Deviation: 5.398µs
Standard Error: 2.04042306253398e-06
I am running 4 asynchronous tasks that will each make sequential requests...
I made 1033 requests in 189.02ms. The per-request latency was 731.914µs. That's 4916.925140612694 asynchronous requests/second.
Variance: 0.055µs
Standard Deviation: 234.489µs
Standard Error: 7.295796636094857e-06
I am running 3 asynchronous tasks that will each make sequential requests...
I made 1328 requests in 255.88ms. The per-request latency was 578.041µs. That's 4423.081937596428 asynchronous requests/second.
Variance: 0.044µs
Standard Deviation: 210.252µs
Standard Error: 5.769536262398735e-06
Your server can handle 2 concurrent requests.
At this level of concurrency, requests have ~0.9x higher latency.
That was after multiple attempts I chose the best result. It's not consistent, I.E. gives wildly differing results each time, for example here was the worst result:
╰─➤ benchmark-http concurrency http://127.0.0.1:8080/tester
I am going to benchmark http://127.0.0.1:8080/tester...
I am running 1 asynchronous tasks that will each make sequential requests...
I made 4006 requests in 1.1s. The per-request latency was 268.569µs. That's 3723.4377404288166 asynchronous requests/second.
Variance: 0.029µs
Standard Deviation: 169.958µs
Standard Error: 2.6852608427339045e-06
I am running 2 asynchronous tasks that will each make sequential requests...
I made 1596 requests in 297.92ms. The per-request latency was 373.329µs. That's 4091.541043575783 asynchronous requests/second.
Variance: 0.022µs
Standard Deviation: 149.025µs
Standard Error: 3.7302991364257682e-06
Your server can handle 1 concurrent requests.
At this level of concurrency, requests have ~1.0x higher latency.
And it's values are about on par from what I'm used to seeing in Ruby apps. Here's wrk again just now with just a single connection to use as a comparison baseline:
╰─➤ cd ../wrk && ./wrk -t 1 -c 1 -d 4 http://127.0.0.1:8080/tester
Running 4s test @ http://127.0.0.1:8080/tester
1 threads and 1 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 54.69us 54.96us 2.34ms 96.47%
Req/Sec 18.59k 2.26k 23.27k 68.29%
75708 requests in 4.10s, 12.49MB read
Requests/sec: 18468.65
Transfer/sec: 3.05MB
And here's a fully saturating run:
╰─➤ cd ../wrk && ./wrk -t 5 -c 60 -d 4 http://127.0.0.1:8080/tester
Running 4s test @ http://127.0.0.1:8080/tester
5 threads and 60 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 1.30ms 5.31ms 83.51ms 96.82%
Req/Sec 32.34k 5.87k 47.96k 84.50%
644077 requests in 4.03s, 106.23MB read
Requests/sec: 159790.94
Transfer/sec: 26.36MB
And here's apachebench (a single-threaded app):
╰─➤ ab -c 60 -t 4 -k -n 5000000 http://127.0.0.1:8080/tester
This is ApacheBench, Version 2.3 <$Revision: 1706008 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 127.0.0.1 (be patient)
Finished 269704 requests
Server Software: nginx/1.10.3
Server Hostname: 127.0.0.1
Server Port: 8080
Document Path: /tester
Document Length: 3 bytes
Concurrency Level: 60
Time taken for tests: 4.000 seconds
Complete requests: 269704
Failed requests: 0
Keep-Alive requests: 267038
Total transferred: 46645798 bytes
HTML transferred: 809118 bytes
Requests per second: 67423.07 [#/sec] (mean)
Time per request: 0.890 [ms] (mean)
Time per request: 0.015 [ms] (mean, across all concurrent requests)
Transfer rate: 11387.64 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.1 0 4
Processing: 0 1 1.0 1 67
Waiting: 0 1 1.0 1 67
Total: 0 1 1.0 1 67
Percentage of the requests served within a certain time (ms)
50% 1
66% 1
75% 1
80% 1
90% 1
95% 2
98% 2
99% 3
100% 67 (longest request)
I just don't think that Ruby is capable of such testing, nor any other interpreted language. You'd need to write benchmark-http in either C, C++, Go, or Rust, maybe Java or .NET Core (but then you have to take into account JIT warmup time) to have it be accurate at all. And don't forget to use keepalive as most HTTP1 browsers (and all HTTP2) will be using it (though many, but not all, API client services don't use it oddly), otherwise you are mostly testing the OS TCP Networking stack startup time. But as it stands I think benchmark-http is mostly testing Ruby's own throughput right now.
Thanks for all the details.
I agree, should rewrite in a compiled language.
If you run it against a real web server over the network, I think you'll generally find you get better consistency.
That being said, I agree with your conclusions.
@ioquatix could take crystal
:stuck_out_tongue:
@waghanza Crystal is also not parallel sadly, and has a GC that will corrupt the results with occasional variance. ^.^; Though it is a lot easier to do concurrency in it, that doesn't help with parallel work or the reliability of no GC.
That being said, if you run it against a real web server over the network, I think you'll generally find you get better consistency.
The results are much worse when I do it that way... ^.^
╰─➤ benchmark-http concurrency http://192.168.1.89/tester 1 ↵
I am going to benchmark http://192.168.1.89/tester...
I am running 1 asynchronous tasks that will each make sequential requests...
I made 2006 requests in 1.3s. The per-request latency was 623.476µs. That's 1603.9119558469351 asynchronous requests/second.
Variance: 0.078µs
Standard Deviation: 279.171µs
Standard Error: 6.233101459918445e-06
I am running 2 asynchronous tasks that will each make sequential requests...
I made 1785 requests in 428.33ms. The per-request latency was 479.919µs. That's 3218.6768911803174 asynchronous requests/second.
Variance: 0.041µs
Standard Deviation: 202.625µs
Standard Error: 4.7959525300267085e-06
I am running 4 asynchronous tasks that will each make sequential requests...
I made 1251 requests in 249.68ms. The per-request latency was 798.341µs. That's 4470.735167561054 asynchronous requests/second.
Variance: 0.079µs
Standard Deviation: 281.468µs
Standard Error: 7.957940519694209e-06
I am running 3 asynchronous tasks that will each make sequential requests...
I made 1751 requests in 433.90ms. The per-request latency was 743.398µs. That's 3392.178350844571 asynchronous requests/second.
Variance: 0.097µs
Standard Deviation: 310.650µs
Standard Error: 7.423843534806914e-06
Your server can handle 3 concurrent requests.
At this level of concurrency, requests have ~1.19x higher latency.
So it reports 3392 req/s, and yet:
╰─➤ cd ../wrk && ./wrk -t 5 -c 20 -d 4 http://192.168.1.89/tester
Running 4s test @ http://192.168.1.89/tester
5 threads and 20 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 0.88ms 284.67us 7.75ms 78.91%
Req/Sec 4.52k 139.72 4.87k 70.73%
92185 requests in 4.10s, 15.20MB read
Requests/sec: 22483.60
Transfer/sec: 3.71MB
22483.60 req/s here. It also reports a significantly higher latency than wrk does as well (there is no way a remote server with <0.6ms ping is responding as slow as 433.90ms as benchmark-http is reporting, wrk reports 0.88ms response average).
In fact, just for comparison, let me whip up a quick http-call loop in Elixir (another interpreted language that is well known for its poor math performance, though decent IO, a purely immutably functional language, I.E. lots of memory creation and other inefficiencies), done and here's a result from the REPL:
iex(5)> BenchmarkHttp.CLI.main(["concurrency", "-d", "5", "-c", "20", "http://192.168.1.89/tester"])
Request Results
Server: ["nginx/1.15.5 (Ubuntu)"]
Date: ["Tue, 29 Jan 2019 20:47:14 GMT"]
Content-Length: ["3"]
Content-Type: ["application/octet-stream"]
Result:
Success: 61628
Failed: 0
Time: 5s
Req/s: 12325.6
%{concurrency: 20, duration: 5, failed: 0, succeeded: 61628}
It's reporting (12325 req/s) about what half of wrk is and many times that of benchmark-http and it's nothing but a trivial loop of http requests across work units of count as specificed by -c
that I know I could significantly optimize if I really tried to (like the client library is even doing things like parsing and verifying headers, parsing the body based on the content type, use more efficient OS time calls since I don't need microsecond resolution just for counting, etc... etc..., all in language and not slaved out to native plugins, I.E. it is not even remotely efficient or optimized).
Hmm, let's also test one my cheapy VM servers that's in another country from here with high latency, I.E. over specifically laggy Internet instead of across just an Intranet network:
╰─➤ benchmark-http concurrency http://my-remote-server/
I am going to benchmark http://my-remote-server...
I am running 1 asynchronous tasks that will each make sequential requests...
I made 8 requests in 522.86ms. The per-request latency was 65.36ms. That's 15.300401576435712 asynchronous requests/second.
Variance: 3.263µs
Standard Deviation: 1.81ms
Standard Error: 0.000638640071178552
I am running 2 asynchronous tasks that will each make sequential requests...
I made 16 requests in 524.51ms. The per-request latency was 65.56ms. That's 29.34524830772133 asynchronous requests/second.
Variance: 6.009µs
Standard Deviation: 2.45ms
Standard Error: 0.0006128254314687405
I am running 4 asynchronous tasks that will each make sequential requests...
I made 24 requests in 389.90ms. The per-request latency was 64.98ms. That's 53.49286001831672 asynchronous requests/second.
Variance: 7.789µs
Standard Deviation: 2.79ms
Standard Error: 0.0005696812234706188
I am running 8 asynchronous tasks that will each make sequential requests...
I made 18 requests in 151.20ms. The per-request latency was 67.20ms. That's 88.97398184850658 asynchronous requests/second.
Variance: 5.040µs
Standard Deviation: 2.24ms
Standard Error: 0.0005291371145748797
I am running 16 asynchronous tasks that will each make sequential requests...
I made 39 requests in 167.54ms. The per-request latency was 68.74ms. That's 181.81855304384172 asynchronous requests/second.
Variance: 9.581µs
Standard Deviation: 3.10ms
Standard Error: 0.00049565301224643
I am running 32 asynchronous tasks that will each make sequential requests...
I made 64 requests in 136.97ms. The per-request latency was 68.48ms. That's 331.36589476486455 asynchronous requests/second.
Variance: 3.857µs
Standard Deviation: 1.96ms
Standard Error: 0.00024549164702755406
I am running 64 asynchronous tasks that will each make sequential requests...
I made 128 requests in 152.12ms. The per-request latency was 76.06ms. That's 582.0211726381192 asynchronous requests/second.
Variance: 31.918µs
Standard Deviation: 5.65ms
Standard Error: 0.0004993577118221741
I am running 128 asynchronous tasks that will each make sequential requests...
I made 256 requests in 146.04ms. The per-request latency was 73.02ms. That's 690.6692888613667 asynchronous requests/second.
Variance: 24.645µs
Standard Deviation: 4.96ms
Standard Error: 0.0003102709973866521
I am running 256 asynchronous tasks that will each make sequential requests...
I made 512 requests in 149.57ms. The per-request latency was 74.78ms. That's 1065.4000351261438 asynchronous requests/second.
Variance: 44.591µs
Standard Deviation: 6.68ms
Standard Error: 0.0002951131485336915
I am running 512 asynchronous tasks that will each make sequential requests...
I made 22683 requests in 42.1s. The per-request latency was 950.95ms. That's 336.34093466912407 asynchronous requests/second.
Variance: 1.9s
Standard Deviation: 1.4s
Standard Error: 0.009160851729160535
I am running 384 asynchronous tasks that will each make sequential requests...
I made 5208 requests in 1.8s. The per-request latency was 132.90ms. That's 1673.9411117341976 asynchronous requests/second.
Variance: 8.30ms
Standard Deviation: 91.11ms
Standard Error: 0.0012625107156406822
I am running 320 asynchronous tasks that will each make sequential requests...
I made 1203 requests in 283.42ms. The per-request latency was 75.39ms. That's 1096.3431941223046 asynchronous requests/second.
Variance: 579.268µs
Standard Deviation: 24.07ms
Standard Error: 0.0006939162140022423
I am running 352 asynchronous tasks that will each make sequential requests...
I made 3517 requests in 841.37ms. The per-request latency was 84.21ms. That's 1520.76700494723 asynchronous requests/second.
Variance: 2.36ms
Standard Deviation: 48.54ms
Standard Error: 0.0008184125161427597
I am running 336 asynchronous tasks that will each make sequential requests...
I made 2033 requests in 537.54ms. The per-request latency was 88.84ms. That's 2050.186508796951 asynchronous requests/second.
Variance: 1.43ms
Standard Deviation: 37.88ms
Standard Error: 0.0008400212237622703
I am running 328 asynchronous tasks that will each make sequential requests...
I made 44480 requests in 173.1s. The per-request latency was 1.3s. That's 236.91531854386741 asynchronous requests/second.
Variance: 7.1s
Standard Deviation: 2.7s
Standard Error: 0.012616618260588743
I am running 324 asynchronous tasks that will each make sequential requests...
I made 1596 requests in 402.33ms. The per-request latency was 81.68ms. That's 1676.7053903668725 asynchronous requests/second.
Variance: 974.471µs
Standard Deviation: 31.22ms
Standard Error: 0.0007813901938619705
I am running 322 asynchronous tasks that will each make sequential requests...
I made 1001 requests in 227.08ms. The per-request latency was 73.05ms. That's 448.50127484693235 asynchronous requests/second.
Variance: 520.052µs
Standard Deviation: 22.80ms
Standard Error: 0.0007207860050025015
I am running 323 asynchronous tasks that will each make sequential requests...
I made 646 requests in 153.12ms. The per-request latency was 76.56ms. That's 696.2517050748319 asynchronous requests/second.
Variance: 261.429µs
Standard Deviation: 16.17ms
Standard Error: 0.0006361518703546547
Your server can handle 323 concurrent requests.
At this level of concurrency, requests have ~1.17x higher latency.
That took almost a half-hour to run!!! o.O! So it says at 323 it can do 696, yet at 324 was it's highest at 1676. Can definitely see where the GC in it is adding a lot of variance for sure!.
With wrk with 324 concurrency too:
╰─➤ cd ../wrk && ./wrk -t 5 -c 324 -d 4 http://my-remote-server/
Running 4s test @ http://my-remote-server/
5 threads and 324 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 76.23ms 28.27ms 521.40ms 97.82%
Req/Sec 816.07 168.01 1.09k 75.76%
16143 requests in 4.07s, 6.53MB read
Requests/sec: 3964.72
Transfer/sec: 1.60MB
wrk does 3964.72 (I'm sure my poor upload to the server is contributing to these issues... And via my http get loop with 324 concurrency as well:
iex(8)> BenchmarkHttp.CLI.main(["concurrency", "-d", "5", "-c", "324", "http://my-remote-server/"])
Request Results
Server: ["nginx/1.14.0 (Ubuntu)"]
Date: ["Tue, 29 Jan 2019 21:16:33 GMT"]
Content-Length: ["196"]
Content-Type: ["text/html"]
Result:
Success: 18034
Failed: 0
Time: 5s
Req/s: 3606.8
%{concurrency: 324, duration: 5, failed: 0, succeeded: 18034}
Almost as fast as wrk. Of course there are the usual internet issues that blows reliable testing completely away so this is expected variance.
Just for fun let's bump my trivial loop to, oh, 2000 concurrent loops for 5 seconds:
iex(9)> BenchmarkHttp.CLI.main(["concurrency", "-d", "5", "-c", "2000", "-s", "308", "http://my-remote-server/tester"])
Request Results
Server: ["nginx/1.14.0 (Ubuntu)"]
Date: ["Tue, 29 Jan 2019 21:17:22 GMT"]
Content-Length: ["196"]
Content-Type: ["text/html"]
Result:
Success: 24667
Failed: 0
Time: 5s
Req/s: 4933.4
%{concurrency: 2000, duration: 5, failed: 0, succeeded: 24667}
Still even higher, even much more so than wrk at 324, so benchmark-http did not find the optimal concurrency level. wrk at 2000 as well is:
╰─➤ cd ../wrk && ./wrk -t 5 -c 2000 -d 5 http://my-remote-server/
Running 5s test @ http://my-remote-server/
5 threads and 2000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 140.73ms 190.73ms 1.94s 86.09%
Req/Sec 0.99k 313.39 1.63k 65.85%
24461 requests in 5.08s, 9.89MB read
Socket errors: connect 0, read 0, write 0, timeout 44
Requests/sec: 4818.76
Transfer/sec: 1.95MB
Also much higher, about on par with my simple request loops but still well within the bounds of internet random variance (my loop should be slower than wrk, by a good margin).
Let's also try ab at 324 and 2000 as well just for more data points:
╰─➤ ab -c 324 -t 5 -n 5000000 http://my-remote-server/
This is ApacheBench, Version 2.3 <$Revision: 1706008 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking my-remote-server (be patient)
Finished 3214 requests
Server Software: nginx/1.14.0
Server Hostname: my-remote-server
Server Port: 80
Document Path: /
Document Length: 196 bytes
Concurrency Level: 324
Time taken for tests: 5.293 seconds
Complete requests: 3214
Failed requests: 0
Non-2xx responses: 3215
Total transferred: 1347085 bytes
HTML transferred: 630140 bytes
Requests per second: 607.22 [#/sec] (mean)
Time per request: 533.581 [ms] (mean)
Time per request: 1.647 [ms] (mean, across all concurrent requests)
Transfer rate: 248.54 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 63 201 424.0 75 3085
Processing: 65 154 248.2 78 2430
Waiting: 65 136 218.9 78 2430
Total: 131 355 523.6 155 3657
Percentage of the requests served within a certain time (ms)
50% 155
66% 161
75% 169
80% 383
90% 1141
95% 1285
98% 2161
99% 3156
100% 3657 (longest request)
╰─➤ ab -c 2000 -t 5 -n 5000000 http://my-remote-server
This is ApacheBench, Version 2.3 <$Revision: 1706008 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking my-remote-server (be patient)
Finished 679 requests
Server Software: nginx/1.14.0
Server Hostname: my-remote-server
Server Port: 80
Document Path: /
Document Length: 196 bytes
Concurrency Level: 2000
Time taken for tests: 5.172 seconds
Complete requests: 679
Failed requests: 0
Non-2xx responses: 679
Total transferred: 284501 bytes
HTML transferred: 133084 bytes
Requests per second: 131.30 [#/sec] (mean)
Time per request: 15232.742 [ms] (mean)
Time per request: 7.616 [ms] (mean, across all concurrent requests)
Transfer rate: 53.72 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 64 950 1253.0 91 3113
Processing: 65 145 376.2 73 3231
Waiting: 65 137 373.1 73 3231
Total: 131 1095 1310.0 175 3807
Percentage of the requests served within a certain time (ms)
50% 175
66% 1153
75% 3135
80% 3146
90% 3176
95% 3286
98% 3795
99% 3803
100% 3807 (longest request)
You can really see how badly ab's single-core model hurts it here, both in throughput and latency! O.o!
First, let me explain what this means:
I am running 323 asynchronous tasks that will each make sequential requests...
I made 646 requests in 153.12ms. The per-request latency was 76.56ms. That's 696.2517050748319 asynchronous requests/second.
Variance: 261.429µs
Standard Deviation: 16.17ms
Standard Error: 0.0006361518703546547
Your server can handle 323 concurrent requests.
At this level of concurrency, requests have ~1.17x higher latency.
It's actually about the same (similar average latency, similar standard deviation):
╰─➤ cd ../wrk && ./wrk -t 5 -c 324 -d 4 http://my-remote-server/
Running 4s test @ http://my-remote-server/
5 threads and 324 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 76.23ms 28.27ms 521.40ms 97.82%
Req/Sec 816.07 168.01 1.09k 75.76%
16143 requests in 4.07s, 6.53MB read
Requests/sec: 3964.72
Transfer/sec: 1.60MB
I made 646 requests in 153.12ms
646 / 0.15312 = ~4200 req/s across 323 connections.
That's 696.2517050748319 asynchronous requests/second.
This is total number of requests / wall-clock time (which isn't given explicitly unfortunately).
This takes into account all overheads, connection setup, etc. It's trying to state more realistically what you'd get.
... [regarding higher concurrency]
When you increased the level of concurrency, your latency went through the roof:
324 concurrent connections:
Latency 76.23ms 28.27ms 521.40ms 97.82%
2000 concurrent connections:
Latency 140.73ms 190.73ms 1.94s 86.09%
The point of benchmark-http
is to find the point at which increasing connections increases latency. You can control how tightly it tries to find this metric, and yes it can take a long time because it tries to account for variance in the network connection during the test by constraining standard error during benchmarking. If your network has issues, it will take much longer for the results to settle.
benchmark-http [--verbose | --quiet] [-h/--help] [-v/--version] <command>
An asynchronous HTTP server benchmark.
[--verbose | --quiet] Verbosity of output for debugging.
[-h/--help] Print out help information.
[-v/--version] Print out the application version.
<command> One of: concurrency, spider.
concurrency [-t/--threshold <factor>] [-c/--confidence <factor>] <hosts...>
Determine the optimal level of concurrency.
[-t/--threshold <factor>] The acceptable latency penalty when making concurrent requests Default: 1.2
[-c/--confidence <factor>] The confidence required when computing latency (lower is less reliable but faster) Default: 0.99
<hosts...> One or more hosts to benchmark
You can make it run more quickly by specifying --confidence 0.9
or even lower. But the results won't be as stable.
You can loosen the bounds of the search by specifying --threshold 1.5
, that would allow latency to get worse by up to 50% when increasing concurrency.
In your example with -c 2000
, your latency was at least 2x worse. That's the point - you can increase the number of connections, but you sacrifice latency - or in other words, the server is responding but it's much slower because it's bottle necked.
The wall clock time is pretty confusing, so I'm going to rework the output a bit to make this clearer.
Hi,
This
PR
aims to implements (or check) whole CPU usage, it closes #69This feature COULD be :
SO_REUSEPORT
viaspawn
/fork
a custom implementation
...
[x] nim
[x] elixir
[x] go
[ ] kotlin
[ ] swift
[x] php
[ ] scala
[ x] rust
[ ] objc
[x] python
[] crystal
[ ] csharp
[x] node
[x] java
[x] cpp
[x] ruby
[ ] c
Regards,