Open alexhartl opened 2 years ago
Flagging this issue for discussion at the core dev sprint unless there is a champion before the sprint.
I have created a draft PR at https://github.com/python/cpython/pull/121264. In this PR, I have not made this feature optional. I'm open to adding the task to _pending_tasks
only if an optional keyword argument is set.
Whenever a future's state transitions from the _PENDING
state (due to finishing, cancelling or an exception), _finish_execution
will be triggered and the task will be removed from _pending_tasks
. I've implemented the _pending_tasks
set as an attribute of the event loop to ensure that no memory leaks are possible when asyncio is deinitialized. I.e. when dropping all references to the loop, and there still is a pending task, you will still get the "Task was destroyed but it is pending!" error. I think this is much more predictable and robust than the current behavior.
_unregister_task
On calling _unregister_task
, asyncio currently removes the task from _scheduled_tasks
. asyncio's documentation for _unregister_task
says "The function should be called when a task is about to finish.". Together, this is inconsistent with asyncio's main implementation, which does not remove tasks from _scheduled_tasks
when they're finishing, but only when they're deleted.
I've changed _unregister_task
to remove the task only from _pending_tasks
but not from _scheduled_tasks
, which makes it consistent with the documentation. This might, of course, break old code that relies on the old behavior. The only code I could find online that uses the _unregister_task
interface is Tornado. Tornado is consistent with the documented behavior, i.e. the new implementation.
As far as I can see, uvloop
uses asyncio's Task
. Therefore, tasks will be registered and unregistered correctly in _pending_tasks
within Task.__init__
and Task._finish_execution
.
In #88831 @vincentbernat pointed out that CPython only keeps weak references in
_all_tasks
, so a reference to aTask
returned byloop.create_task
has to be kept to be sure the task will not be killed with a "Task was destroyed but it is pending!" at some random point in time.When shielding a task from cancellation with
await shield(something())
,something
continues to run when the containing coroutine is cancelled. As soon as that happens,something()
is free-flying, i.e. there's no reference from user code anymore.shield
itself has a bunch of circular strong references, but these shouldn't keep CPython from garbage-collecting the task. Hence, here the same problem occurs and the task might be killed unpredictably. Additionally, when running coroutines in parallel withgather
andreturn_exceptions=False
, an exception in one of the coroutines will leave remaining tasks free-flying. Also in this case, the remaining tasks might be killed unpredictably.Hence, a warning in the documentation for
create_task
unfortunately does not suffice to solve the problem. Additionally, it has been brought up in #88831 that an API for fire-and-forget tasks (i.e. when the user doesn't want to keep a reference) would be nice.As solution, I suggest to either
(1) introduce a further
_pending_tasks
set to keep strong references to all pending tasks. This would be the simplest solution also with respect to the API. In fact, a lot of dicussions on Stack Overflow (e.g., here, here, here) already rely on this behavior (throwing away the reference returned bycreate_task
), although it's wrong currently. Since the behaviour for free-flying tasks is unpredictable currently, it should not introduce any compatibility issues when making it predictable by preventing them from being garbage-collected.(2) make sure there's always a chain of strong references from the most basic futures to the running tasks awaiting something. A quick
grep
resulted in potential problems, e.g., here and here. This does not seem like a very robust approach, though.(3) introduce the concept of background tasks, i.e., tasks the user does not want to hold references to. The interface could look like suggested in https://github.com/python/cpython/issues/88831#issuecomment-1105619239 . Tasks from
shield
andgather
could be automatically converted to such background tasks. Clearly, it would add complexity to the API, but the distinction between normal tasks and background tasks might potentially be beneficial also for other purposes. E.g., one might add an API call that waits for all background tasks to be completed.My preferred solution would be (1).
Linked PRs