python / cpython

The Python programming language
https://www.python.org
Other
62.17k stars 29.88k forks source link

asyncio crashes when tearing down the proactor event loop #83413

Closed bc2cd49a-2dcc-436a-86bb-6178c73b723b closed 2 years ago

bc2cd49a-2dcc-436a-86bb-6178c73b723b commented 4 years ago
BPO 39232
Nosy @asvetlov, @cjrh, @1st1, @zooba, @cmeyer, @mikeshardmind, @jack1142, @pepoluan
Files
  • example.py: toy code demonstrating the difference.
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['3.10', '3.8', '3.9', 'type-crash', 'expert-asyncio'] title = 'asyncio crashes when tearing down the proactor event loop' updated_at = user = 'https://github.com/mikeshardmind' ``` bugs.python.org fields: ```python activity = actor = 'machine.gw' assignee = 'none' closed = False closed_date = None closer = None components = ['asyncio'] creation = creator = 'mikeshardmind' dependencies = [] files = ['48829'] hgrepos = [] issue_num = 39232 keywords = [] message_count = 7.0 messages = ['359448', '359515', '362174', '367932', '392261', '396339', '400132'] nosy_count = 11.0 nosy_names = ['asvetlov', 'cjrh', 'yselivanov', 'steve.dower', 'cmeyer', 'mikeshardmind', 'jack1142', 'rmawatson', 'pepoluan', 'lawsonjl.ornl', 'machine.gw'] pr_nums = [] priority = 'normal' resolution = None stage = None status = 'open' superseder = None type = 'crash' url = 'https://bugs.python.org/issue39232' versions = ['Python 3.8', 'Python 3.9', 'Python 3.10'] ```

    bc2cd49a-2dcc-436a-86bb-6178c73b723b commented 4 years ago

    When using asyncio.run for an asynchronous application utilizing ssl, on windows using the proactor event loop the application crashes when the loop is closed, completely skipping a finally block in the process.

    This appears to be due to a __del__ method on transports used.

    Manual handling of the event loop close while including a brief sleep appears to work as intended.

    Both versions work fine with the selector event loop on linux.

    This appears to be a somewhat known issue already, as it's been reported to aiohttp, however both the traceback, and the differing behavior seem to indicate this is an issue with the proactor event loop.

    (On linux this still emits a resource warning without the sleep)

    While I don't mind handling the loop cleanup, it seems like this case should also emit a resource warning rather than crashing.

    If it's decided in which way this should be handled, I'm willing to contribute to or help test whatever direction the resolution for this should go.

    Traceback included below, toy version of the problem attached as code.

    Exception ignored in: <function _ProactorBasePipeTransport.__del__ at 0x0000026463039820>
    Traceback (most recent call last):
      File "C:\Users\Michael\AppData\Local\Programs\Python\Python38\lib\asyncio\proactor_events.py", line 116, in __del__
        self.close()
      File "C:\Users\Michael\AppData\Local\Programs\Python\Python38\lib\asyncio\proactor_events.py", line 108, in close
        self._loop.call_soon(self._call_connection_lost, None)
      File "C:\Users\Michael\AppData\Local\Programs\Python\Python38\lib\asyncio\base_events.py", line 715, in call_soon
        self._check_closed()
      File "C:\Users\Michael\AppData\Local\Programs\Python\Python38\lib\asyncio\base_events.py", line 508, in _check_closed       
        raise RuntimeError('Event loop is closed')
    RuntimeError: Event loop is closed
    bc2cd49a-2dcc-436a-86bb-6178c73b723b commented 4 years ago

    I don't know if it would be feasible to add this to asyncio, but having a way to mark a resource as needing to be deterministically cleaned up at loop close could probably solve this as well as the underlying reasons why the transports are leaning on __del__ behavior which is an implementation detail (differing on, for example, pypy) and probably improve the overall usefulness of asyncio.run as well.

    An addition like that probably requires more discussion than fixing this crash though.

    bc2cd49a-2dcc-436a-86bb-6178c73b723b commented 4 years ago

    Linking out to a real-world example where this still manages to happen after running the event loop for an entire 2 seconds waiting for transports to close themselves after finishing everything else:

    https://github.com/Cog-Creators/Red-DiscordBot/issues/3560

    As well as what we're currently looking at for a temporary solution for this at this point:

    https://github.com/Cog-Creators/Red-DiscordBot/pull/3566

    I looked into what would need to change to handle this in CPython, but am not confident in my ability to make such a PR after doing so, at least not without more discussion about it.

    The best solution I considered involves making the only public way to make transports be tied to an event loop which hasn't been closed yet, and ensuring the event loop keeps a reference to each of these so that it can deterministically close them at loop finalization. Searching GitHub alone found that this would break way too many things.

    If this can't be fully fixed, a solution which at least ensures this can't cause an uncatchable exception would be appreciated.

    5aa086a9-e355-4e25-880e-559fe8d921b7 commented 4 years ago

    This is also happening in other libraries besides aiohttp. Specifically xhttp on windows. There is an open issue at https://github.com/encode/httpx/issues/914

    zooba commented 3 years ago

    I'm also seeing (and annoyed by) this.

    Do we need to do anything more than suppress the error? If we're just failing to notify the event loop that we're closing because the loop is gone, does the notification even matter?

    abe345e8-128f-4608-a8d2-e03bba22380f commented 3 years ago

    We are running into this all the time, ever since the Proactor became the default on Windows in 3.8.

    Usually it comes up when the program terminates due to an unhandled exception during a highly concurrent operation. The resulting cascade of RuntimeErrors often obscures the real reason for failure, which makes debugging more painful than it should be. But sometimes this cascade of RuntimeErrors will occur even when the program otherwise executes successfully. So it can be difficult to know if the program actually blew up or if it's just benign event loop vomit.

    Converting this particular error to a warning would be great, but eliminating the call to the event loop in __del would be even better. I understand that's easier said than done, or else ProactorBasePipeTransport wouldn't be leaning on __del in the first place.

    ad42554f-f71b-4d7c-8ee9-30ed05160e6e commented 3 years ago

    I added some code to wait for all tasks completion before exit: currentTask = asyncio.current_task() for t in asyncio.all_tasks(): if currentTask != t: await t and still got the exception

    Then I think it created additional thread somewhere and created an event_loop inside it. To dig it out I sub-classed threading.Thread. I also wait for all tasks in all threads before exiting:

    loops: list[asyncio.AbstractEventLoop] = []

    class MyThread(threading.Thread):
        def start(self):
            global loops
            loops.append(asyncio.get_event_loop())
            super().start()
    
    async def func1():
        async with aiohttp.ClientSession() as session:
            async with session.get('https://www.okex.com/api/v5/public/time') as resp:
                print(resp.status)
                print(await resp.json())
    
    threading.Thread = MyThread
    import aiohttp
    
    async def main():
        global loops
        loops.append(asyncio.get_running_loop())
        print(sys.argv)
        task = asyncio.create_task(func1())
        await task
        print('done.')
    
        currentTask = asyncio.current_task()
        for loop in loops:
            for t in asyncio.all_tasks(loop):
                if currentTask != t:
                    await t
    
        print('done2.')
        #await asyncio.sleep(1)
    #if __file__ == '__main__':
    asyncio.run(main())

    Then I found out the thread is created inside _loop.getaddrinfo: (files are from python 3.9.6) File aiohttp\resolver.py, line 31, in ThreadedResolver.resolve FILE asyncio\base_events.py, line 856, in BaseEventLoop(ProactorEventLoop).getaddrinfo

    And it is strange that another thread is created when program exit: FILE asyncio\base_events.py, line 563, in BaseEventLoop(ProactorEventLoop).shutdown_default_executor

    But sad it seems vscode cannot break a __del__ call. If I break somewhere else first then it would not crash:(

    zooba commented 2 years ago

    eliminating the call to the event loop in del would be even better. I understand that's easier said than done, or else ProactorBasePipeTransport wouldn't be leaning on del in the first place.

    This seems like a great idea, but I have no idea how feasible it is.

    Do we have any active experts in asyncio who can make a judgement call on this? Suppressing the error in __del__ seems like the safer option, but it also looks like it's only called from __del__, so maybe that's not the right idea.

    arhadthedev commented 2 years ago

    @zooba I have a PR fixing this (gh-92842).

    zooba commented 2 years ago

    @arhadthedev Thanks, that definitely would seem to explain it.

    We may be a long time past when @vstinner added those destructors but maybe he can remember whether proactor and selector loops having different self.close() vs self._sock.close() calls was intentional? Or maybe it was moved from somewhere else?

    vstinner commented 2 years ago

    maybe he can remember whether proactor and selector loops having different self.close() vs self._sock.close() calls was intentional?

    I don't recall the rationale, if there was a rationale :-) The code has a long history coming from the tulip project. It even supported Python 3.3 and older.

    When you're in __del__(), the world is burning, maybe it's not a good idea to still schedule future callbacks by calling self.close() which calls self._loop.call_soon(self._call_connection_lost, None) whereas self is "being destroyed" (in fact, the GC is more complex than it, an object can "resurrect"). That's likely why I didn't call self.close() but self._sock.close() (in the selector implementation): the end of the self.close() -> self._call_connection_lost() callback chain.

    Doing the same in proactor_events.py makes sense. For me, the only important part is to log the ResourceWarning to make sure that people don't get into this code path but close explicitly their asyncio transports. Relying on the GC to close transports, for making asynchronous code reliable is... a bad idea. In asyncio, close() is asynchronous and transports have a is_closing() method!

    zooba commented 2 years ago

    When you're in __del__(), the world is burning, maybe it's not a good idea to still schedule future callbacks by calling self.close()

    Okay, if we both agree on that part, let's take the fix for 3.11 and see whether the world breaks.

    I think it should go to 3.10, but without anyone claiming to fully understand it, I'd rather get some real world experience in 3.11 beta before backporting to a stable release.

    mikeshardmind commented 2 years ago

    I can confirm the linked PR (gh-92842) fixes the issue as I originally described.

    I still think that relying on __del__ here is not necessarily a good thing, but in the absence of expert opinion in the discussion at the time about addressing that, I didn't bother to look much further into it and just ensured all transports in my code were closed before leaving scope (side-stepping the issue in my own code, rather than fixing it as a whole.)

    arhadthedev commented 2 years ago

    This issue may be closed because the bug is fixed:

    spyoungtech commented 2 years ago

    I'm running into a similar (but principly the same issue) when using asyncio.subprocess with pipes that are open when the script exits. If this belongs in a separate issue, let me know.

    As a simple example using a IsolatedAsyncioTestCase and pytest

    import asyncio
    from asyncio.subprocess import create_subprocess_exec
    from unittest import IsolatedAsyncioTestCase
    import subprocess
    
    class Test83413(IsolatedAsyncioTestCase):
    
        async def asyncSetUp(self) -> None:
            self.p = await create_subprocess_exec('notepad', stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
            # This will cause an error because we're not cleaning up the open pipes
    
        async def test_simple(self):
            await asyncio.sleep(1)
            assert True is False

    Behaviors observed:

    Exception in 3.11.0b5 running `pytest` ```python Exception ignored in: Traceback (most recent call last): File "C:\Program Files\Python311b5\Lib\asyncio\base_subprocess.py", line 125, in __del__ _warn(f"unclosed transport {self!r}", ResourceWarning, source=self) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Program Files\Python311b5\Lib\asyncio\base_subprocess.py", line 70, in __repr__ info.append(f'stdin={stdin.pipe}') ^^^^^^^^^^^^^^^^^^^^^ File "C:\Program Files\Python311b5\Lib\asyncio\proactor_events.py", line 79, in __repr__ info.append(f'fd={self._sock.fileno()}') ^^^^^^^^^^^^^^^^^^^ File "C:\Program Files\Python311b5\Lib\asyncio\windows_utils.py", line 102, in fileno raise ValueError("I/O operation on closed pipe") ValueError: I/O operation on closed pipe ```
    In 3.9.0 running `pytest` ```python Exception ignored in: Traceback (most recent call last): File "C:\Python39\lib\asyncio\base_subprocess.py", line 126, in __del__ self.close() File "C:\Python39\lib\asyncio\base_subprocess.py", line 104, in close proto.pipe.close() File "C:\Python39\lib\asyncio\proactor_events.py", line 108, in close self._loop.call_soon(self._call_connection_lost, None) File "C:\Python39\lib\asyncio\base_events.py", line 746, in call_soon self._check_closed() File "C:\Python39\lib\asyncio\base_events.py", line 510, in _check_closed raise RuntimeError('Event loop is closed') RuntimeError: Event loop is closed Exception ignored in: Traceback (most recent call last): File "C:\Python39\lib\asyncio\proactor_events.py", line 115, in __del__ _warn(f"unclosed transport {self!r}", ResourceWarning, source=self) File "C:\Python39\lib\asyncio\proactor_events.py", line 79, in __repr__ info.append(f'fd={self._sock.fileno()}') File "C:\Python39\lib\asyncio\windows_utils.py", line 102, in fileno raise ValueError("I/O operation on closed pipe") ValueError: I/O operation on closed pipe Exception ignored in: Traceback (most recent call last): File "C:\Python39\lib\asyncio\proactor_events.py", line 115, in __del__ _warn(f"unclosed transport {self!r}", ResourceWarning, source=self) File "C:\Python39\lib\asyncio\proactor_events.py", line 79, in __repr__ info.append(f'fd={self._sock.fileno()}') File "C:\Python39\lib\asyncio\windows_utils.py", line 102, in fileno raise ValueError("I/O operation on closed pipe") ValueError: I/O operation on closed pipe Exception ignored in: Traceback (most recent call last): File "C:\Python39\lib\asyncio\proactor_events.py", line 115, in __del__ _warn(f"unclosed transport {self!r}", ResourceWarning, source=self) File "C:\Python39\lib\asyncio\proactor_events.py", line 79, in __repr__ info.append(f'fd={self._sock.fileno()}') File "C:\Python39\lib\asyncio\windows_utils.py", line 102, in fileno raise ValueError("I/O operation on closed pipe") ValueError: I/O operation on closed pipe ```

    Expected (my preferred) behavior is the simple warning that the process is being killed (the same as when no stdin/stdout/stderr arguments are provided):

    Close running child process: kill <_WindowsSubprocessTransport closed pid=23660 running>

    Reasonable (or more knowledgable) people could disagree on whether that's expected or not, but it would make reading my test outputs much easier :)

    arhadthedev commented 2 years ago

    If this belongs in a separate issue, let me know.

    Did you run the test on Linux or macOS?

    I cannot reproduce it on Windows 10. If you used non-Windows platform that we can be sure that this is caused not by proactor_events.py (covered by this issue) but by something else (so a new PR would be suitable).

    Just for the record, here's my output for a clean build of 0771d71eea30316020a86b0eec071feb6d82d350 (labeled as v3.11.0b5):

    C:\Users\oleg\Documents\dev\notmine\cpython>python test83413.py
    Running Release|x64 interpreter...
    
    C:\Users\oleg\Documents\dev\notmine\cpython>python --version
    Running Release|x64 interpreter...
    Python 3.11.0b5