The cost of creating threads and the usage of nogvl creates inefficiencies.
rb_nogvl should really only be used when the amount of work to be done is greater than some scheduling quantum. In practice, that's hard to achieve, so we also want to minimise the overhead of blocking_operation_wait (later abbreviated BOW).
I've been using async-cable as a benchmark as it has a good mixture of IO and nogvl (inflate/deflate) operations. Those operations are typically extremely small, so the overhead is revealed greatly. Another benchmark is the recently introduced IO::Buffer#copy using nogvl on large buffers. It is the opposite - highly CPU bound (memory bound actually) work with little IO.
Semantics remain unchanged.
Async::Cable Benchmarks
This benchmark is mostly network bound and there are a lot of small calls to inflate/deflate which uses rb_nogvl:
Configuration
Connection Time
Message Time
No BOW
0.67ms
0.024ms
Thread BOW
2.2ms
0.96ms
Work Pool BOW
0.83ms
0.045ms
Overall, we can see a net loss in performance by offloading rb_nogvl with a pure Ruby implementation. I believe we can attribute this to the offloading thread having to re-acquire the GVL which creates unnecessary contention. This is fixable but requires a native code path (probably in the IO::Event scheduler implementation.
IO::Buffer Benchmarks
This benchmark is more memory bound and there is essentially zero blocking IO:
Configuration
Task Count
Buffer Size
Duration
Throughput
No BOW
1
100MiB
6.46ms
15GB/s
No BOW
8
100MiB
28.93ms
27GB/s
No BOW
16
100MiB
56.81ms
28GB/s
Thread BOW
1
100MiB
6.99ms
14GB/s
Thread BOW
8
100MiB
24.52ms
32GB/s
Thread BOW
16
100MiB
44.33ms
36GB/s
Work Pool BOW
1
100MiB
7.12ms
14GB/s
Work Pool BOW
8
100MiB
20.41ms
39GB/s
Work Pool BOW
16
100MiB
43.53ms
36GB/s
Overall, the thread and work pool are similar. I believe we see GVL contention even on the background threads as I'd expect the numbers to be a little more linear, although it's also true the memory bandwidth isn't unlimited.
See https://github.com/socketry/async/pull/352 for context.
The cost of creating threads and the usage of
nogvl
creates inefficiencies.rb_nogvl
should really only be used when the amount of work to be done is greater than some scheduling quantum. In practice, that's hard to achieve, so we also want to minimise the overhead ofblocking_operation_wait
(later abbreviated BOW).I've been using
async-cable
as a benchmark as it has a good mixture of IO and nogvl (inflate/deflate) operations. Those operations are typically extremely small, so the overhead is revealed greatly. Another benchmark is the recently introducedIO::Buffer#copy
using nogvl on large buffers. It is the opposite - highly CPU bound (memory bound actually) work with little IO.Semantics remain unchanged.
Async::Cable
BenchmarksThis benchmark is mostly network bound and there are a lot of small calls to inflate/deflate which uses
rb_nogvl
:Overall, we can see a net loss in performance by offloading
rb_nogvl
with a pure Ruby implementation. I believe we can attribute this to the offloading thread having to re-acquire the GVL which creates unnecessary contention. This is fixable but requires a native code path (probably in theIO::Event
scheduler implementation.IO::Buffer
BenchmarksThis benchmark is more memory bound and there is essentially zero blocking IO:
Overall, the thread and work pool are similar. I believe we see GVL contention even on the background threads as I'd expect the numbers to be a little more linear, although it's also true the memory bandwidth isn't unlimited.
Types of Changes
Contribution