Closed tmds closed 4 years ago
Regarding io_uring:
I added a package IoUring.Concurrent
to the usual MyGet feed. It primarily features the ConcurrentRing
class (as opposed to the non-thread-safe Ring
in the IoUring
package).
ConcurrentRing
allows you to prepare SQE
s and consume CQE
s without external locking. Internally a Volatile.Read
+ Interlocked.CompareExchange
loop is used to ensure correctness. SubmitAndWait
is internally synchronized using a lock.
The API of ConcurrentRing
is slightly more involved as the following example demonstrates:
// using Ring
var r = new Ring(8);
r.PrepareNop();
// using ConcurrentRing
var cr = new ConcurrentRing(8);
if (!cr.TryAcquireSubmission(out Submission submission)) throw null;
submission.PrepareNop();
cr.Release(submission);
After a successful TryAcquireSubmission
the Submission
must be immediately "prepared" and "released", as all Submission
s past a non-released Submission
will not be submitted to the kernel during the next SubmitAndWait
.
To prepare multiple linked SQE
s, the method TryAcquireSubmissions(Span<Submission>)
is used to acquire a Span<Submission>
of adjacent SQE
s.
I assume this will help you to further reduce locking in your code. Let me know if you run into issues or need help with experiments.
There are two ways to implement this:
IoUring.Concurrent
does the first. The implementation is more challenging, but maybe it pays off.
We can try both and measure.
@tkp1n can you make the implementation on top of IoUring.Concurrent
?
In case where SocketAsyncEventArgs
completes on the io_uring thread (by setting RunContinuationsAsynchronously=false
), we are in a situation where no synchronization is currently needed to submit new operations. Adding synchronization will cause a performance regression. We can do comparative benchmarks by looking at the RunContinuationsAsynchronously=true, PreferSynchronousCompletion=false
case.
I try to find the time to come up with a PR for an IoUring.Concurrent
-based approach.
FYI: I already added a small micro-benchmark to IoUring to identify the overhead of using ConcurrentRing
over Ring
in situations where synchronization is not needed. Let's see if the overhead pays off in a more real world benchmark here.
There are two ways to implement this: Let threads submit directly to the submission queue. Pool submissions in a thread-safe collection for the io_uring thread to submit.
@antonfirsov, @tkp1n is doing the first approach. Do you want to take a shot at the second?
You can focus on io_uring and ignore epoll/AIO.
You need to make AsyncExecutionQueue
thread-safe for adding operations, and ensure we are able to unblock the io_uring_enter
when an action is scheduled to run on io_uring thread. To do that we will need to increase coupling between IOUringThread
and IOUringExecutionQueue
.
@tmds do we expect significant difference from the two approaches?
@tkp1n is it possible to achieve ConcurrentRing
behavior using the original liburing
?
is it possible to achieve
ConcurrentRing
behavior using the originalliburing
?
liburing
is not thread-safe AFAIK
When writing up this issue I was not considering the ConcurrentRing
approach because the implementation is more challenging and I'm not sure about the gains. Since we have ConcurrentRing
implementation available, we can use it and measure.
I'm going to close this because I don't plan to investigate this further. If someone wants to look into it, we'll re-open the issue.
AIO and io_uring AsyncExecutionQueues are implemented as not thread-safe. Operations that are added, are added from the epoll/io_uring thread.
For epoll, these are the operations that are indicated ready by epoll.
And for epoll/io_uring these are also operations that are newly started from the thread when
RunContinuationsAsynchronously=false
(t=false
).Other operations can be forced onto the epoll/io_uring thread by setting
PreferSynchronousCompletion=false
(s=true
,r=true
). This is implemented by scheduling an action to the epoll/io_uring thread, which then adds an operation to the batch:https://github.com/tmds/Tmds.LinuxAsync/blob/601c88372d14ef5b784d574ec741d3fb5df6d666/src/Tmds.LinuxAsync/IOUringAsyncEngine.Queue.cs#L157-L161
An alternative implementation would be to make the AsyncExecutionQueues thread-safe, and remove the intermediate actino that runs on the epoll/io_uring thread.
cc @antonfirsov @adamsitnik