python-trio / trio

Trio – a friendly Python library for async concurrency and I/O
https://trio.readthedocs.io
Other
6.15k stars 335 forks source link

Consider io_uring as an alternative to epoll #932

Open thedrow opened 5 years ago

thedrow commented 5 years ago

io_submit() is not async but it allows batching of syscalls which may or may not be better for a user's usecase.

Should we introduce an IOManager which utilizes it?

Read more here.

pquentin commented 5 years ago

Relevant discussion in chat: https://gitter.im/python-trio/general?at=5c2ff4d65ec8fe5a850242ac:

I am skeptical that micro-optimizations to reduce syscalls or copying are a measurable win in python But I could well be wrong :-) And the trick of using IOCB_CMD_POLL is super clever (and actually equivalent to what we're going to do on Windows)

njsmith commented 5 years ago

There's also the upcoming io_uring: https://lwn.net/Articles/776703/

Some practical challenges: Python ships with a wrapper for epoll, but not for io_submit. Currently we don't use cffi or compiled modules on Linux, but we could reconsider that if it makes sense. Also... we don't currently have any good benchmarks, which are sort of important for figuring out whether this is a win or not :-)

As a general rule, Trio is much less keen to offer configurable/pluggable IOManagers than some of the more well-known Python event loop libraries. But if this turns out to be a worthwhile change, then I don't see why it wouldn't be worthwhile for everyone!

It's not something I'll personally be looking at in the foreseeable future, because well see this list :-). But if you do any experiments then I will be super curious to see how it goes!

thedrow commented 5 years ago

I won't be looking into this soon. Maybe after we finish rewriting Celery to use trio.

njsmith commented 5 years ago

Maybe after we finish rewriting Celery to use trio.

Is the discussion around this happening in public somewhere? Anything we can help with?

thedrow commented 5 years ago

@njsmith It will soon. Right after we finish with releasing 4.3.

njsmith commented 5 years ago

I asked Jens Axboe about io_uring on twitter, and got some more details (click on each message in turn to see the subthreads).

The most interesting part is that he sounds serious about providing a solid async interface to disk I/O. So it will be interesting to see how that shapes up...

njsmith commented 4 years ago

Another advantage of being an early adopter: you get to flush out problems while there's still a chance to fix them. I did another pass looking at io_uring yesterday, realized that there was a potential problem with the overflow handling for network I/O use cases, emailed @axboe about it, and today we have v2 of a proposed patch to fix that: https://lore.kernel.org/io-uring/20191106235307.32196-1-axboe@kernel.dk/T/

This strikes me as a somewhat compelling argument for hurrying up and implementing this for real, even if it will be a while before it becomes widely available, because implementing it for real is the best way to shake out any other issues that might be lurking.

njsmith commented 4 years ago

Renamed the issue, since I'm pretty confident now that if we're adding another Linux backend, it'll be io_uring, not io_submit :-). io_submit's advantages over epoll are very minor: you can do some "real" async disk I/O but only under extremely specific conditions that make it unsuitable for almost all applications, and you get some syscall batching. And io_submit has a serious limitation: at the beginning of your program, when you call io_setup, you have to declare the maximum number of I/O operations you'll ever want to have in-flight at once. This is super annoying for a generic I/O library like Trio – there are some heuristics we could use if we had to, like maybe setting it to some multiple of getrlimit(RLIMIT_NOFILE)? But it's not really worth it.

By contrast, io_uring gives you the same syscall batching, but it also provides genuine new superpowers like async disk I/O that's useful to regular programs (or at least, that's the plan... I'm not 100% sure how much of the plan is implemented yet). And in the initial releases, it has the same limitation as io_submit WRT the maximum number of in-flight operations, but that looks like that will be solved in the v5.5 kernel release (with a feature flag so userspace can detect the new/better behavior, tentatively named IORING_FEAT_NODROP).

So some next steps would be:

YoSTEALTH commented 4 years ago

https://pypi.org/project/liburing/ https://github.com/YoSTEALTH/liburing

thedrow commented 4 years ago

https://pypi.org/project/liburing/ https://github.com/YoSTEALTH/liburing

Nice work but it's not a CFFI binding which will make this harder to run efficiently on PyPy. Any chance you'll rewrite it?

njsmith commented 4 years ago

IMO CFFI would also be easier to maintain, but if you're the one maintaining it then obviously that's up to your taste :-)

A good Python wrapper for liburing is definitely needed to use io_uring, and it makes a lot of sense to have it as a standalone library that trio can depend on.

I think we'd be very hesitant to add a dependency on something under the "Unlicense", though; see: https://softwareengineering.stackexchange.com/questions/147111/what-is-wrong-with-the-unlicense If you want something minimal, then maybe consider using a more boring/conventional version like MIT or 2-clause BSD? Or liburing itself is LGPL, so you might make the wrapper LGPL to match.


In other news: it turns out there's a pretty major bug in io_uring, where the epoll-like support can't handle ttys or pipes. There's a proposed patch to fix this here: https://lore.kernel.org/io-uring/20200210205650.14361-1-axboe@kernel.dk/T/#u

So this will hopefully get sorted out for 5.6.0, and hopefully backported to at least 5.5.x, but we'll have to see. This is a blocker for Trio relying on io_uring though, so we'll want to keep track, and eventually figure out how to detect this issue in our "does io_uring work or should we use epoll?" logic.

YoSTEALTH commented 4 years ago

It is using CFFI https://github.com/YoSTEALTH/Liburing/blob/master/builder.py

I went with Unlicense because SQLite also uses it https://www.sqlite.org/copyright.html and its mostly in everything out there. I am sure SQLite is used in Germany and Australia as well.

The software is in public domain, there is no limits whatsoever.

There are too many things going on with liburing/io_uring to keep up. For now i am looking forward to 0.4.0 being released end of this week and hopefully have my Liburing(python) release ASAP.

I kept everything as close to liburing(C) as i could so anyone should be able to adapt it into their software.

njsmith commented 4 years ago

I went with Unlicense because SQLite also uses it https://www.sqlite.org/copyright.html and its mostly in everything out there. I am sure SQLite is used in Germany and Australia as well.

Sqlite doesn't use the Unlicense; they use a different public domain dedication. And they get away with it because they're one of the most important pieces of software in the world, and because they also sell licenses to companies in places like Germany or Australia who need more assurances.

Basically this is a totally practical concern – there are lots of companies out there that absolutely do audit the licenses on their dependencies, and for unusual licenses they either can't use them or have to call the lawyers to do some manual review. If Trio's dependency chain includes projects with "weird" licenses like that, then it makes it hard for folks at companies like that to use Trio. I want it to be easy to use Trio, so we can't use libraries with "weird" licenses.

If you really want public domain, then I'd suggest the CC0 dedication/license; it's basically the same as the unlicense but well-known and written by actual lawyers.

The software is in public domain, there is no limits whatsoever.

BTW, this is a bit of a misconception – a public domain dedication drops your copyright (in jurisdictions where that's possible), but it doesn't do anything about other intellectual property rights like patents and trademarks. The main reason Trio includes Apache2 as one of its licensing options is because I don't like patents and wanted to explicitly reject the option of patenting anything. With a public domain dedication then in principle you're still keeping the right to file for patents and then sue your users.

YoSTEALTH commented 4 years ago

For the next release if i change it to CC0 You are happy with that? https://choosealicense.com/licenses/cc0-1.0/

njsmith commented 4 years ago

@YoSTEALTH sure, that'd be fine.

The other thing that would be really helpful would be to have a wheel on pypi :-). Since you only care about linux, and cffi can target the stable ABI, I think you only need 1 wheel to cover all users. Or, well, maybe 3 I guess, if you want to be ambitious and cover x86-64 + both 32- and 64-bit ARM.

YoSTEALTH commented 4 years ago

Sure, i can look into it. I am not sure if io_uring is designed to work on 32bit, will have to find out.

YoSTEALTH commented 4 years ago

Does Liburing work on 32bit?

"Yes, the ABI is designed as such that there are on differences between 32 and 64-bit." Jens Axboe

njsmith commented 4 years ago

Note: Ubuntu 20.04 has shipped, and unfortunately they ended up with kernel v5.4. Our io_uring support will definitely need features that were added in v5.5+. In particular, we need the IORING_FEAT_NODROP that was added in v5.5, which is mandatory to let io_uring handle large socket sets. (We might turn out to need other features too, but I'm not sure yet.)

Unfortunately, this means that it'll be at least 6 months until there's an official Ubuntu release with usable io_uring, and 2 years before there's an LTS :-/.

njsmith commented 4 years ago

(On kernel v5.4 it might be possible to use a hybrid event loop, where io_uring is used for disk I/O and epoll is used for network I/O. But would be a lot of complexity to work around a temporary issue, so I'm inclined to only offer "all-epoll" and "all-io_uring" options.)

YoSTEALTH commented 4 years ago

I am looking at setting the current minimum Linux required to be 5.6 Just because there are so many features missing... like IORING_OP_SEND|RECV was added in 5.6 also IORING_OP_EPOLL_CTL. IORING_OP_ACCEPT was added in 5.5.

more on this here https://lwn.net/Articles/810414/

njsmith commented 4 years ago

@YoSTEALTH We probably won't use any of those though :-). EPOLL_CTL is only useful for hybrid event loops, and for network operations it's much simpler to keep using our current strategy of waiting for readiness and then issuing a non-blocking send/recv/etc., and almost as efficient.

Also note that the built-in network operations like SEND/RECV/ACCEPT are actually unusable until 5.7 (!!). Before that, they get implemented as blocking operations in a thread-pool, which is gratuitously expensive and can deadlock if you have too many simultaneous operations. In 5.7, they're being reworked so that io_uring automatically compiles them down into a wait-for-readiness + non-blocking operation.

At least, I'm 99% sure that's the change referred to here:

  • Re-work of how pollable async IO is handled, we no longer require thread offload to handle that. Instead we rely using poll to drive this, with task_work execution.
axboe commented 4 years ago

It's a pretty easy backport, but you have to convince them to backport... That part may be harder, and probably will have more luck on the RH side. Internally at Facebook, our predominant kernel is 5.2, but it's up-to-date with io_uring up to current 5.7-rc kernels. Very few external dependencies to keep an eye out for, so would be easy to make 5.4 current. Only bits missing in the 5.2-fb kernel that I skipped was OPENAT2. But even that would be trivial to add, just didn't have an internal need for it yet.

njsmith commented 4 years ago

This looks like a nice reference on io_uring: https://unixism.net/loti/index.html

pquentin commented 4 years ago

For what it's worth, we're now testing the 5.4 (Ubuntu 20.04) and 5.6 (Fedora 32) Linux kernels in our continuous integration, see #1510. Which is probably the reason why @njsmith is looking for documentation. :-)

YoSTEALTH commented 4 years ago

This looks like a nice reference on io_uring: https://unixism.net/loti/index.html

Its nice to have documentation finally, should help clear out a lot of confusion.

njsmith commented 4 years ago

Also note that as of kernel 5.7, io_uring's OP_POLL features can't handle all pollable resources: https://github.com/axboe/liburing/issues/117

It's unclear when exactly this will be fixed. Maybe hybrid epoll/io_uring event loops will be required for the foreseeable future?


I also spent a few minutes today messing around trying to build liburing statically against musl. The reason being, right now, Trio never needs a C compiler to install. It would be nice if we could keep that when we add io_uring support. The obvious solution is for the Python wrapper to publish a manylinux wheel, and that would be pretty straightforward and work on all glibc-based distros. But! You would still need a compiler on Alpine.

Maybe that's fine. Alpine users are used to that kind of problem. But still, in theory I feel like it ought to be possible to do better...... liburing barely interacts with libc or the rest of the system at all. So if we built liburing against musl, we could potentially get a shared library that could be loaded and work on any linux system, regardless of which libc is being used.

I think in theory this would work, but the reality was kind of messy, because liburing #include's a bunch of kernel header files, and kernel header files are a very confusing place: there are the header files the kernel uses internally, the "uapi" versions that the kernel suggests userspace use, the differently munged version of the uapi files that ship with glibc and with musl, the libc header files that export their own types and constants that overlap with the kernel's types...

Anyway, the end result is that it took a lot of faffing around with include paths, and error messages like unknown type loff_t, but I did manage to get a build of liburing.so itself that appears to be entirely self-contained and is only 66 KiB.

Unfortunately, that's not enough on its own – important parts of liburing are only exposed as static inline functions, so we can't just use an FFI to dlopen("liburing.so") and start using it from Python. We need to build some kind of C wrapper library that exports those inline functions.

@YoStealth's liburing-python bindings do that, using CFFI to generate a .c wrapper that re-exports the stuff we need. But, for some reason, I could not manage to get this wrapper library to build with musl! This is surely some trivial issue with plumbing through the right includes to the right place, but for some reason the same include paths were giving me different results.

It would probably be more useful to ask Jens what is even supposed to be happening with the liburing include paths and whether he supports musl, rather than just continuing to thrash about with trial and error :-)

YoSTEALTH commented 4 years ago

@njsmith have you install liburing python directly from github? its newer then the one on pypy. I haven't used musl to comment on it atm.

For me io_uring_prep_recv doesn't work, even though i can use io_uring_prep_read. Also only place i am using io_uring_prep_poll_add is with io_uring_prep_accept. I am going with all io_uring interface no select.poll or epoll

thedrow commented 4 years ago

I just recalled that you have a command to run a kernel in userspace using user mode linux. I don't know how hard it is to set it up on Travis, but you should definitely consider it over using a VM since it might be faster to boot and run.

YoSTEALTH commented 4 years ago

Python's socket class doesn't work nice with io_uring there is some kind of internal conflict with socket.setblocking method. Also some of io_uring non-blocking features are disabled by the class as well.

njsmith commented 4 years ago

@YoSTEALTH I can't think of any way that io_uring would even know that you're using Python's socket class, or vice-versa. Are you sure you aren't misinterpreting something?

YoSTEALTH commented 4 years ago

@njsmith narrowed it down to https://git.kernel.dk/cgit/linux-block/commit/?h=for-5.8/io_uring and fixed, still need to test it. So python socket module is safe for now even though I did write a wrapper to os socket as well since most of the stuff is handled by liburing e.g. io_uring_prep_accept, ...

njsmith commented 4 years ago

Ah, nice diagnosis. A weird interaction between IORING_OP_SEND and O_NONBLOCK makes sense; there isn't much reason to use those together, so you're probably the first person to test it :-).

It's also nice to see that the for-5.8/io_uring tag there has the fix to make IORING_OP_POLL_ADD handle all the fd types that epoll can handle: https://git.kernel.dk/cgit/linux-block/commit/?h=for-5.8/io_uring&id=18bceab101adde8f38de76016bc77f3f25cf22f4

So hopefully 5.8 will be the version where it becomes viable for a generic event loop to use io_uring exclusively without epoll.

YoSTEALTH commented 4 years ago

In python we use setblocking(False) to enable non-blocking socket which sets O_NONBLOCK flag. In io_uring it was using MSG_DONTWAIT flag for IORING_OP_RECV,... Since they are two different ways of setting non-blocking it wasn't compatible.

I am just doing my part, really though the work @axboe is doing and amount of patience he has dealing with people is amazing... lots of improvements coming to io_uring like https://lore.kernel.org/io-uring/20200523185755.8494-1-axboe@kernel.dk/T/#u this is huge!

njsmith commented 4 years ago

In python we use setblocking(False) to enable non-blocking socket which sets O_NONBLOCK flag. In io_uring it was using MSG_DONTWAIT flag for IORING_OP_RECV,... Since they are two different ways of setting non-blocking it wasn't compatible.

I think IORING_OP_RECV is intended to be used on blocking sockets, though? I.e. you want the operation to "block" inside io_uring until it actually completes, and then give you a notification. Using it on non-blocking sockets seems like more complexity and slower for little gain.

lots of improvements coming to io_uring like https://lore.kernel.org/io-uring/20200523185755.8494-1-axboe@kernel.dk/T/#u this is huge!

That does look pretty sweet...

YoSTEALTH commented 4 years ago

As far as i can tell blocking anything would be bad for io_uring if a socket is blocking it is then made non-blocking, you can see what it does here https://git.kernel.dk/cgit/linux-block/tree/fs/io_uring.c?h=for-5.8/io_uring&id=9f5fa2cc5a867428961c84f7b6e4dfc40899f502#n3925

YoSTEALTH commented 4 years ago

@njsmith looks like O_NONBLOCK patch was reverted back, apparently it was causing other issues like not being able to do "blocking read". Python setblocking(False) will not work, you will have to pass MSG_DONTWAIT flag manually for send and receive.

ghost commented 3 years ago

Scratch that, see comment below. See edit history for what this originally was.

shachaf commented 3 years ago

No, you can poll file descriptors for readability/writeability with IORING_OP_POLL_ADD (I was originally mentioned in this thread because I'd tested it and it didn't work in some cases, like ttys, but I mentioned it to axboe and I think it's fixed now).

You can also either poll on a uring fd from epoll, or (I think) on an epoll fd from a uring.

ghost commented 3 years ago

Oops! I didn't see that initially. Disregard, then.

graingert commented 2 years ago

@fantix has a io_uring asyncio port in the works: https://github.com/fantix/kloop/

it also supports ktls!

stalkerg commented 1 year ago

@graingert Unfortunately, development stopped and was on Chines service, outside github/gitlab... A license is also strange, and not sure about compatibility with Python ecosystem licenses. The idea is cool, but it will not have life without community involving; currently, it's a local Chinese project.

theLastOfCats commented 5 months ago

@graingert Unfortunately, development stopped and was on Chines service, outside github/gitlab... A license is also strange, and not sure about compatibility with Python ecosystem licenses. The idea is cool, but it will not have life without community involving; currently, it's a local Chinese project.

Looks like it's approved by OSI

http://lists.opensource.org/pipermail/license-review_lists.opensource.org/2020-February/004695.html