tmds / Tmds.LinuxAsync

MIT License
25 stars 3 forks source link

Queue async operations directly from non epoll/io_uring thread #51

Closed tmds closed 4 years ago

tmds commented 4 years ago

AIO and io_uring AsyncExecutionQueues are implemented as not thread-safe. Operations that are added, are added from the epoll/io_uring thread.

For epoll, these are the operations that are indicated ready by epoll.

And for epoll/io_uring these are also operations that are newly started from the thread when RunContinuationsAsynchronously=false (t=false).

Other operations can be forced onto the epoll/io_uring thread by setting PreferSynchronousCompletion=false (s=true, r=true). This is implemented by scheduling an action to the epoll/io_uring thread, which then adds an operation to the batch:

https://github.com/tmds/Tmds.LinuxAsync/blob/601c88372d14ef5b784d574ec741d3fb5df6d666/src/Tmds.LinuxAsync/IOUringAsyncEngine.Queue.cs#L157-L161

An alternative implementation would be to make the AsyncExecutionQueues thread-safe, and remove the intermediate actino that runs on the epoll/io_uring thread.

cc @antonfirsov @adamsitnik

tkp1n commented 4 years ago

Regarding io_uring:

I added a package IoUring.Concurrent to the usual MyGet feed. It primarily features the ConcurrentRing class (as opposed to the non-thread-safe Ring in the IoUring package).

ConcurrentRing allows you to prepare SQEs and consume CQEs without external locking. Internally a Volatile.Read + Interlocked.CompareExchange loop is used to ensure correctness. SubmitAndWait is internally synchronized using a lock.

The API of ConcurrentRing is slightly more involved as the following example demonstrates:

// using Ring
var r = new Ring(8);
r.PrepareNop();

// using ConcurrentRing
var cr = new ConcurrentRing(8);
if (!cr.TryAcquireSubmission(out Submission submission)) throw null;
submission.PrepareNop();
cr.Release(submission);

After a successful TryAcquireSubmission the Submission must be immediately "prepared" and "released", as all Submissions past a non-released Submission will not be submitted to the kernel during the next SubmitAndWait.

To prepare multiple linked SQEs, the method TryAcquireSubmissions(Span<Submission>) is used to acquire a Span<Submission> of adjacent SQEs.

I assume this will help you to further reduce locking in your code. Let me know if you run into issues or need help with experiments.

tmds commented 4 years ago

There are two ways to implement this:

IoUring.Concurrent does the first. The implementation is more challenging, but maybe it pays off.

We can try both and measure.

@tkp1n can you make the implementation on top of IoUring.Concurrent?

In case where SocketAsyncEventArgs completes on the io_uring thread (by setting RunContinuationsAsynchronously=false), we are in a situation where no synchronization is currently needed to submit new operations. Adding synchronization will cause a performance regression. We can do comparative benchmarks by looking at the RunContinuationsAsynchronously=true, PreferSynchronousCompletion=false case.

tkp1n commented 4 years ago

I try to find the time to come up with a PR for an IoUring.Concurrent-based approach.

FYI: I already added a small micro-benchmark to IoUring to identify the overhead of using ConcurrentRing over Ring in situations where synchronization is not needed. Let's see if the overhead pays off in a more real world benchmark here.

tmds commented 4 years ago

There are two ways to implement this: Let threads submit directly to the submission queue. Pool submissions in a thread-safe collection for the io_uring thread to submit.

@antonfirsov, @tkp1n is doing the first approach. Do you want to take a shot at the second?

You can focus on io_uring and ignore epoll/AIO.

You need to make AsyncExecutionQueue thread-safe for adding operations, and ensure we are able to unblock the io_uring_enter when an action is scheduled to run on io_uring thread. To do that we will need to increase coupling between IOUringThread and IOUringExecutionQueue.

antonfirsov commented 4 years ago

@tmds do we expect significant difference from the two approaches?

@tkp1n is it possible to achieve ConcurrentRing behavior using the original liburing?

tkp1n commented 4 years ago

is it possible to achieve ConcurrentRing behavior using the original liburing?

liburing is not thread-safe AFAIK

tmds commented 4 years ago

When writing up this issue I was not considering the ConcurrentRing approach because the implementation is more challenging and I'm not sure about the gains. Since we have ConcurrentRing implementation available, we can use it and measure.

tmds commented 4 years ago

I'm going to close this because I don't plan to investigate this further. If someone wants to look into it, we'll re-open the issue.