python / cpython

The Python programming language
https://www.python.org
Other
63.14k stars 30.23k forks source link

asyncio - add shield scope context manager for improved control flow in sensitive code sections #99714

Open whatamithinking opened 1 year ago

whatamithinking commented 1 year ago

Feature or enhancement

Add a shield_scope context manager which can shield a block of code from cancellation and block waiting until it completes. A pending cancellation for a task would either be raised during the next non-shielded await or at the end of the task. Shields are task-local and are not be copied into child tasks, to prevent breaking the behavior of child tasks.

await cancellable_work()
with shield_scope():
    state = 2
    await work_to_complete_after_state_change()
    await other_important_work()
print('sync code, cannot be cancelled')
await maybe_cancelled_work()

Pitch

The current shielding mechanism, asyncio.shield works by wrapping one future in another so the cancellation applies to the outer future while the inner future continues to run. While this solution is effective in protecting the awaitable you pass in, control flow becomes muddled (the exception bubbles up while this future is now running detached in the background) and the code the shield covers often has to grow to encompass other resources to avoid control flow issues, limiting the value of the cancellation features of asyncio.

The simplest current workaround to maintaining control flow using asyncio.shield seems to be to wrap up the coroutine in a task and wait on that after the cancellation occurs. This solution adds a lot of overhead (Task object + loop cycle until task can start) to shielded functions, which can add up quickly in hot code paths. If shielded functions are nested, where to re-raise the cancellation also becomes a challenge requiring contextual information to know if we are still inside a coroutine which is shielded.

import asyncio

async def work():
    print("task started")
    task = asyncio.create_task(asyncio.sleep(5))
    try:
        await asyncio.shield(task)
    except asyncio.CancelledError:
        print("task cancellation caught")
        await task
        print("task ran to completion")
        raise

async def main():
    task = asyncio.create_task(work())
    await asyncio.sleep(1)
    task.cancel()
    await task

asyncio.run(main())

The following example hopefully provides a more concrete example of how a shield scope could more efficiently solve this shielding / control flow need and be easier to work with.

The following shows how a shield_scope might be used. If the same code were run with asyncio.shield wrapping the publish coroutines, the resource_lock acquired by create_resource would be released before publish completed and the delete_resource coroutine would finish publishing before the create_resource coroutine.

import asyncio

# lets say we have a collection of resources and a pub/sub system 
# which we want to send all signals related to those resources to
# lets also say that we want all messages (created, deleted, updated, etc.)
# to be sequential for each resource. to do that, we use a lock for each
# resource. there may be some contention around each lock, as
# each operation for a resource may take awhile, so we want to maintain
# the ability to cancel waiting on the lock at any time (such as a client of a web server might do)

async def publish(msg, delay):
    print(f"starting publish: {msg}")
    await asyncio.sleep(delay)
    print(f"ending publish: {msg}")

async def create_resource(resource_lock):
    print("create - wait lock")
    await resource_lock.acquire()
    print("create - acquire lock")
    try:
        print("create - resource")
        # once we have created the resource, we have to finish publishing
        # or the other parts of the system wont know about it
        # we have to hold the lock or publishing of the created message
        # may interleave with publishing of the deleted message
        with shield_scope():
            await publish("create - resource", delay=3)
    finally:
        resource_lock.release()
        print("create - release lock")

async def delete_resource(resource_lock):
    # the lock is outside the shield so we can still cancel waiting on it
    # if there is contention with the lock. maybe we come back later.
    print("delete - wait lock")
    await resource_lock.acquire()
    print("delete - acquire lock")
    try:
        print("delete - resource")
        with shield_scope():
            await publish("delete - resource", delay=2)
    finally:
        resource_lock.release()
        print("delete - release lock")

async def main():
    resource_lock = asyncio.Lock()
    creator = asyncio.create_task(create_resource(resource_lock))
    await asyncio.sleep(0)
    deleter = asyncio.create_task(delete_resource(resource_lock))
    await asyncio.sleep(0)
    creator.cancel()
    await asyncio.sleep(5)

asyncio.run(main())

Previous discussion

Linked PRs

whatamithinking commented 1 year ago

Here is a python PoC. All the code for asyncio PyTask had to be copied in order to override __step and references to it.

gvanrossum commented 1 year ago

Can you please show the PoC code in the form of a draft PR (presumably modifying the Task class)? That would be easier to follow.

whatamithinking commented 1 year ago

Is there anything I can do to move this along or do I just need to wait for someone to review it?

agronholm commented 1 year ago

I was hoping that we could add real cancel scopes to asyncio. They are vastly more versatile, and should cover this use case very well. My concern here is that adding this will further complicate their eventual introduction.

gvanrossum commented 1 year ago

I have no opinion on the proposed feature here (and probably never will, there are too many things), but @agronholm your mention of "real cancel scopes" sounds provocative. Can you perhaps open an issue (or a Discourse thread) explaining what you want there? Without pointing to a 16-page blog post by NJS please, I am unable to concentrate long enough to read those. :-)

agronholm commented 1 year ago

Sorry, I didn't mean to sound provocative. I intended to get around to this in the coming months when I had more time, but if this is going to be worked on in the near future, then I would like to get involved rather than seeing the work proceed in a direction I see as unfavorable. I'll make a discourse post and link it here.

rmorshea commented 11 months ago

For anyone coming here looking for how to do this with asyncio currently, here's an implementation I came up with. It might not work 100% of the time and may have unintended consequences, but in most cases it seems to work as expected:

import asyncio
from contextlib import asynccontextmanager

@asynccontextmanager
async def cancel_shield():
    task = asyncio.current_task()
    cancel_messages: list[str | None] = []
    task.cancel = lambda msg: cancel_messages.append(msg)
    task.uncancel = lambda: cancel_messages.pop()
    try:
        yield
    finally:
        del task.cancel
        del task.uncancel
        for msg in cancel_messages:
            task.cancel(msg)

Example usage:

async def work():
    async with cancel_shield():
        await asyncio.sleep(5)
        print("work done")

async def main():
    task = asyncio.create_task(work())
    await asyncio.sleep(1)
    task.cancel()
    await task

asyncio.run(main())
agronholm commented 5 months ago

I apologize for taking such a long time to address this. I had a long talk with @gvanrossum at PyCon US two days ago on this very subject. I pitched the idea of adding Trio-style cancel scopes with opt-in level cancellation, but he was understandably hesitant to add new features/paradigms to asyncio given how it's already hard to maintain.

As a compromise, we came to an agreement that I could design some new low-level machinery to asyncio that would make it less painful for third party libraries (like AnyIO) to implement cancel scopes. AnyIO already has a cancel scope implementation on top of asyncio, but it employs horrible hacks to achieve that, and there are outstanding bugs in that implementation. I don't yet know what that low-level machinery will look like, but once I have some proof-of-concept code, I will notify you here.

Meanwhile, would you mind taking a look at AnyIO if you haven't already, and letting me know if it helps your use case, or if not, what I could do to get you the rest of the way there?