tokio-rs / mio

Metal I/O library for Rust.
MIT License
6.27k stars 720 forks source link

Support for io_uring #1591

Open Noah-Kennedy opened 2 years ago

Noah-Kennedy commented 2 years ago

Adding io_uring support here would make it significantly easier to get io_uring support in tokio.

io_uring supports both readiness-based and completion-based APIs. Readiness-based APIs should be relatively simple. Completion-based support is more complicated.

Noah-Kennedy commented 2 years ago

@Thomasdezeeuw I think that this is potentially something we need to figure out before we lock in our APIs for 1.0.

Thomasdezeeuw commented 2 years ago

@Noah-Kennedy I was thinking Mio v1 should remain poll based, i.e. the current implementation. For Mio v2 we would be completion based seeing how it's now supported by both Linux and Windows, hopefully macOS and the BSDs will support something similar (or even better the same API).

notgull commented 2 years ago

I have to wonder if tokio could use alternate runtimes on a feature flag. As in, by default it uses mio, but if the io_uring feature flag is enabled it replaces it with an io_uring-based runtime?

Noah-Kennedy commented 2 years ago

One of the key things to bring up around io_uring is that it can actually do both readiness-based and completion-based IO, and there are actually benefits to how io_uring does readiness-based IO over how epoll does it. For this reason I don't think we really need to drop support for poll-based IO in order to support completion-based IO, we can merely find a way to implement completion-based IO as a sort of extension API, which is what I'm trying to think about how to do.

Noah-Kennedy commented 2 years ago

@notgull I've thought about that and discussed it with others, and it is an option, but I would much rather bake in io_uring support into mio in order to make this whole process somewhat easier to manage.

Thomasdezeeuw commented 2 years ago

One of the key things to bring up around io_uring is that it can actually do both readiness-based and completion-based IO, and there are actually benefits to how io_uring does readiness-based IO over how epoll does it. For this reason I don't think we really need to drop support for poll-based IO in order to support completion-based IO, we can merely find a way to implement completion-based IO as a sort of extension API, which is what I'm trying to think about how to do.

I'm a little hesitant to support io_uring in v1 because I don't really fancy support both epoll and io_uring implementations at the same time, but dropping epoll is not an option (due to backwards compatibility). Furthermore I don't think io_uring supports all fd types that epoll supports, at least not in earlier versions (maybe it caught up now, I don't know). This means we need to make io_uring either optional/a feature in v1, or we need to do error detecting and falling back to epoll.

Darksonn commented 2 years ago

One concern that I have been wondering about is that completion based APIs behave noticably differently when it comes to errors. Any AsyncRead/AsyncWrite implementation would need to immediately report that the write has succeeded, and then return the error on a future write if it failed. For this reason it seems to me that we would need to continue using a readiness-based API indefinitely for Tokio net types such as TcpStream.

I would like to add here, that we also expose explicitly readiness-based APIs such as AsyncFd or the TcpStream::{readable,writeable,try_read,try_write,try_io} methods. Unless it becomes possible to submit "readiness operations" to io_uring, it appears to me that these APIs must necessarily continue to use epoll.

None of this is a concern for Tokio file types as they already have completion-based behavior today.

Thomasdezeeuw commented 2 years ago

One concern that I have been wondering about is that completion based APIs behave noticably differently when it comes to errors. Any AsyncRead/AsyncWrite implementation would need to immediately report that the write has succeeded, and then return the error on a future write if it failed. For this reason it seems to me that we would need to continue using a readiness-based API indefinitely for Tokio net types such as TcpStream.

I don't think this is really a concern, mainly because AsyncRead/AsyncWrite won't work at all for a completion based design. E.g. in a read call how do we ensure that the buffer (&mut [u8]) stays alive long enough for the OS to write into it, how do we deal with early drops of the Future, etc. I think we'll need a completely new set of traits for completion based I/O.

I would like to add here, that we also expose explicitly readiness-based APIs such as AsyncFd or the TcpStream::{readable,writeable,try_read,try_write,try_io} methods. Unless it becomes possible to submit "readiness operations" to io_uring, it appears to me that these APIs must necessarily continue to use epoll.

None of this is a concern for Tokio file types as they already have completion-based behavior today.

Noah-Kennedy commented 2 years ago

I'm in agreement with @Thomasdezeeuw regarding the traits.

Noah-Kennedy commented 2 years ago

@Thomasdezeeuw uring, like epoll, supports any pollable file descriptors for polling with IORING_OP_POLL_ADD.

Darksonn commented 2 years ago

uring, like epoll, supports any pollable file descriptors for polling with IORING_OP_POLL_ADD.

I was not aware of this. In that case, I imagine that you could implement AsyncRead/AsyncWrite by using that to wait for readiness, then perform the actual read with the same non-blocking syscall as we do today. However, this does not seem like it would be an improvement over just continuing to use epoll.

I don't think this is really a concern, mainly because AsyncRead/AsyncWrite won't work at all for a completion based design.

I mean, Tokio uses mio to implement types that implement those traits, so regardless of what mio uses, there needs to be some way to implement the traits using it.

I note that you can use the traits with io_uring if you copy the data into a buffer owned by the IO resource. This wouldn't be good for Tokio's TcpStream, but it would be an improvement to implement Tokio files in that manner.

Noah-Kennedy commented 2 years ago

@Darksonn my thought with the polling support is that it could be used to have uring replace epoll when a feature or runtime flag is set. For the readiness-based APIs, we could use the polling APIs within uring. For completion-based APIs, we would use the normal, completion-based features of io_uring.

Thomasdezeeuw commented 1 year ago

I've been working on io_uring in a different repo: https://github.com/Thomasdezeeuw/a10. Maybe it can become a Mio v2, maybe it should separate as it doesn't support anything other than Linux at the moment.