python-trio / trio

Trio – a friendly Python library for async concurrency and I/O
https://trio.readthedocs.io
Other
6.15k stars 335 forks source link

Design: do we need batched accept / accept_nowait? #14

Open njsmith opened 7 years ago

njsmith commented 7 years ago

https://bugs.python.org/issue27906 suggests that it's important to be able to accept multiple connections in a single event loop tick.

It's trivial to implement, too -- just

def accept_nowait(self):
     try:
         return self._sock.accept()
     except BlockingIOError:
         raise _core.WouldBlock from None

But...

I am... not currently able to understand how/why this can help. Consider a simple model where after accepting each connection, we do a fixed bolus of CPU-bound work taking T seconds and then immediately send the response, so each connection is handled in a single loop step. If we accept only 1 connection per loop step, then each loop step takes 1 * T seconds and we handle 1 connection/T seconds on average. If we accept 100 connections per loop step, then each loop step takes 100 * T seconds and we handle 1 connection/T seconds on average.

Did the folks in the bug report above really just need an increased backlog parameter to absorb bursts? Batching accept() certainly increases the effective listen queue depth (basically making it unlimited), but "make your queues unbounded" is not a generically helpful strategy.

The above analysis is simplified in that (a) it ignores other work going on in the system and (b) it assumes each connection triggers a fixed amount of synchronous work to do. If it's wrong it's probably because one of these factors matters somehow. The "other work" part obviously could matter, if the other work is throttlable at the event loop level, in the sense that if loop steps take longer then they actually do less work. Which is not clear here (the whole idea of batching accept is make it not have this property, so if this is how we're designing all our components then it doesn't work...).

I guess one obvious piece of "other work" that scales with the number of passes through the loop is just, loop overhead. One would hope this is not too high, but it is not nothing.

If we do want this, then #13 will want to use it.

The same question probably applies to recvfrom / sendto -- right now a task can only send or receive 1 UDP packet per event loop tick.

njsmith commented 7 years ago

Here's another case where accept_nowait would be useful: with unix domain sockets, it's possible to do no-downtime upgrades by binding the new listening socket to a temporary name, then atomically renaming it over the old listening socket, and then having the old process drain and handle any pending connection attempts.

Draining a listening socket means, handle all currently pending attempts when you know that no new ones will be arriving. That's a pretty natural fit to accept_nowait, and not really doable with blocking accept alone.

This is pretty obscure, but if you need it, you really need it. And who knows, maybe someday some OS will make it possible to do something similar for AF_INET{,6} listening sockets, which would be really handy.

njsmith commented 7 years ago

Ah-hah, I thought of a case where batched accept makes a difference:

Say we have a mixed workload, involving a CPU-bound "background task" that regularly sleep(0)s to let other stuff run, plus a server task that accepts new connections. Suppose the CPU bound task sleep(0)s every 20 ms, and that each incoming connection can be completely serviced in 0.1 ms. Then in principle our server can handle 1000 connections per second (albeit at that point the background task doesn't get to run at all), but if we yield after every accept then we end up only servicing 1000 / 20 = 50 connections / second.

I think this situation is kind of problematic all around... it would be difficult to service a connection in 0.1 ms in Python, and only yielding every 20 ms is not so great, and if the goal is that the heavy background task shouldn't interfere with the light connection tasks, then wouldn't the Right solution be to make a better scheduling algorithm (#32) or just assign a low static priority to the background task? But I'm sure this kind of thing happens, at least at times.

I guess that from this point of view, the point of a batched accept is that normally an accept loop is a very light task (it does very little work between calls to accept), so standard round-robin scheduling will tend to implicitly assign it a very low priority (each of its individual time slices end up being very small compared to other tasks, and round-robin says that they all get the same number of time slices). So it's a heuristic way of compensating for a particular failure mode of the not-very-clever round-robin scheduler.

njsmith commented 6 years ago

It looks like #636 will solve the API parts of this issue, by moving the Listener abstraction layer higher. So that will mean that SocketListener can do batched accept or not, whichever makes sense, without having to change the user-visible interface.

njsmith commented 4 years ago

Something I just realized: at some point we may make cancel_shielded_checkpoint() skip the checkpoint if the task hasn't been running for very long. We've looked at this mostly for efficiency on microbenchmarks (running the scheduler is expensive compared to not running the scheduler :-)), but it would also automatically solve some of these fairness issues with tasks that only do a little bit of work before hitting a checkpoint. E.g., if we made it so that each task with work to do gets a minimum of 10 ms (this is what the Go runtime currently uses for its preemption threshold, see forcePreemptNS here), then that would be enough time to do a ton of accepts on each tick.

Fuyukai commented 1 year ago

Note that in the year 2022 we can punt batched accept up into the kernel level with io_uring and IORING_ACCEPT_MULTISHOT, which continuously pushes accepted connections onto the completion queue without needing to do accept().