Closed jsaak closed 3 years ago
Probably the reason for this is due to some of the overhead from the way Ruby's I/O works in 3.0 and we hope to improve this for 3.1 - however if you want to confirm this, can you please benchmark it using perf diff
.
it is specific with your implementation, the libev implementation is not affected:
libev-scheduler.rb 0.07ms 39.43us 0.89ms 88.08%
Okay, I will take a look, thanks for the details.
Thanks for creating this benchmark, it's quite a useful one.
I have to normalise the numbers against simple.c
implementation.
On my desktop, simple.c gives me:
> wrk -t1 -c1 http://localhost:9090
Running 10s test @ http://localhost:9090
1 threads and 1 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 15.07us 28.08us 1.96ms 98.81%
Req/Sec 23.27k 779.55 24.63k 66.00%
231547 requests in 10.00s, 10.16MB read
Requests/sec: 23154.44
Transfer/sec: 1.02MB
Currently, the new implementation of async gives me:
> wrk -t1 -c1 http://localhost:9090
Running 10s test @ http://localhost:9090
1 threads and 1 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 50.32us 93.07us 2.32ms 97.60%
Req/Sec 13.31k 415.33 13.93k 74.26%
133767 requests in 10.10s, 5.87MB read
Requests/sec: 13245.11
Transfer/sec: 595.00KB
However, this is using the IO wrappers. I want to try with native IO.
I need to check this more, but the initial result for using native IO within Async:
> wrk -t1 -c1 http://localhost:9090
Running 10s test @ http://localhost:9090
1 threads and 1 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 34.12us 119.76us 5.81ms 97.67%
Req/Sec 17.73k 654.55 18.59k 82.18%
178210 requests in 10.10s, 7.82MB read
Requests/sec: 17645.03
Transfer/sec: 792.65KB
It's about 80% of C implementation.
#!/usr/bin/env ruby
require_relative 'lib/async'
require 'socket'
Async do |task|
server = TCPServer.new('localhost', 9090)
loop do
client, address = server.accept
task.async do
client.recv(1024)
client.send("HTTP/1.1 204 No Content\r\nConnection: close\r\n\r\n", 0)
client.close
end
end
end
Increasing the concurrency gives us an improved throughput.
Here is simple.c
:
> wrk -t3 -c6 http://localhost:9090
Running 10s test @ http://localhost:9090
3 threads and 6 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 78.90us 197.55us 5.73ms 97.13%
Req/Sec 19.79k 1.28k 22.89k 69.31%
596464 requests in 10.10s, 26.17MB read
Requests/sec: 59055.53
Transfer/sec: 2.59MB
vs async
> wrk -t3 -c6 http://localhost:9090
Running 10s test @ http://localhost:9090
3 threads and 6 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 189.14us 138.12us 3.59ms 93.74%
Req/Sec 9.04k 602.45 10.55k 68.65%
272489 requests in 10.10s, 11.95MB read
Requests/sec: 26979.59
Transfer/sec: 1.18MB
I think we need some more work on this. I'll check what perf
says. However even being 2x slower than C is pretty good.
These numbers seems much better than what I measured. Good work. I recommend taking a look at libev_scheduler. With sub millisecond max latency it is quite impressive. But 3-5ms is a sane number, i think most applications can work with that.
For this small benchmark, max latency should be < 1ms.
Actually, our implementation of the event scheduler should be slightly more efficient than libev. However, my computer is quite old (Intel 4770).
Also, I have some issues with how wrk measures latency. I think we need to see a histogram to understand latency better.
By the way, the above number is generated by new io_uring
implementation. It has synchronous operation, I think we can find some improvements by better management of sq
.
Okay, I have taken simple.c
and made a direct comparison with our new event handling library:
https://github.com/socketry/event/runs/2538704259?check_suite_focus=true#step:6:23
We could consider adding the others too. This is running as part of GitHub actions so we can keep track of it over time.
I hope going forward we will attain close to the performance of the C implementation.
I tested some ruby TCP server solutions and, benchmarked the results:
https://github.com/jsaak/ruby3-tcp-server-mini-benchmark
Using Async the latency was below 1ms which is good, however when using the async scheduler it became much slower: