python-trio / trio-asyncio

a re-implementation of the asyncio mainloop on top of Trio
Other
187 stars 37 forks source link

trio_asyncio shielding cancellation causes errors to be silenced. #115

Closed elektito closed 7 months ago

elektito commented 2 years ago

I was having a hard time figuring out why my program suddenly stopped working without any errors and without keyboard interrupts working. After a lot of digging, I found that there was a simple AttributeError but it wasn't being raised. And it only happened with trio-asyncio running. I managed to come up with this minimal example:

import trio
import trio_asyncio

async def foo():
    async with trio_asyncio.open_loop():
        await trio.sleep(0)

async def main():
    async with trio.open_nursery() as nursery:
        nursery.start_soon(foo)
        raise RuntimeError('error!')

if __name__ == '__main__':
    trio.run(main)

Digging further, I found this bit in the _base.py module is to blame:

with trio.CancelScope(shield=True):
    while not self._stopped.is_set():
        await self._main_loop_one()

I'm not familiar with trio_asyncio internals, so I don't know why the shield=True is necessary, but it can clearly cause issues like the one I was experiencing.

oremanj commented 1 year ago

The problem here is actually not enough shielding, not too much. The RuntimeError causes open_loop() to be cancelled before it starts the asyncio event loop, which causes a deadlock in the cleanup task that waits for the asyncio event loop to stop. The fix is to add with trio.CancelScope(shield=True): around await loop_nursery.start(loop._main_loop). I'll put up a PR for this once #121 lands.

oremanj commented 7 months ago

My described fix does solve this, but not for the reason I thought; TaskStatus.started() doesn't complete Nursery.start() if the call to start() is cancelled, and since _main_loop does all its work inside a shield, the result is that start() never returns at all (and thus we can never reach the _main_loop_exit cleanup). I filed https://github.com/python-trio/trio/issues/2895 to track this in mainline Trio, since I think the thing trio-asyncio is doing currently ought to work.