jupyterhub / kubespawner

Kubernetes spawner for JupyterHub
https://jupyterhub-kubespawner.readthedocs.io
BSD 3-Clause "New" or "Revised" License
543 stars 304 forks source link

make sure futures can be cancelled #797

Closed minrk closed 11 months ago

minrk commented 11 months ago

In order to cancel tasks, we need to have a handle on the Future, not just the Coroutine object.

The code we have implements TaskGroup-style cancellation of pending tasks, since gather doesn't cancel unfinished tasks unless gather itself is cancelled, but TaskGroups are new in Python 3.11.

See the note in the docs for asyncio.gather:

TaskGroup provides stronger safety guarantees than gather for scheduling a nesting of subtasks: if a task (or a subtask, a task scheduled by a task) raises an exception, TaskGroup will, while gather will not, cancel the remaining scheduled tasks).

i.e. TaskGroup has the behavior "if anything here fails, stop everything still running and then propagate the error", while gather has the behavior "if anything here fails, propagate the error, but leave any other tasks running for the user to clean up or not." And since we don't yet require Python 3.11, we use gather and then implement the cancellation ourselves.

closes #796

minrk commented 11 months ago

It may well make sense to actually change the wait to return_when=ALL_COMPLETED so there's nothing to cancel, especially in the stop_all. But that's not the most important, really, since it's a cleanup utility purely for use in tests and always means a serious problem in the test suite if there's any error anyway.

Pro waiting for everything instead of cancelling:

Con:

minrk commented 11 months ago

The third option is not cancelling at all. But then we're in the situation of a task still running with nothing waiting for it (e.g. stop could still be outstanding when the next start is called, etc.). Again, this wouldn't be the biggest deal because there are likely serious kubernetes communication problems whenever this comes up, but leaking tasks nobody's waiting for is generally something to avoid, which I think is why asyncio added trio-style TaskGroups so folks can write some code that does "start some async things; but no matter what happens, when I exit this context, none of those tasks are running".