python-trio / trio

Trio – a friendly Python library for async concurrency and I/O
https://trio.readthedocs.io
Other
6.13k stars 335 forks source link

Strange crash: KeyError while trying to 'del runner.deadlines[...]' on a cancel scope with -inf deadline #837

Closed Zwork101 closed 5 years ago

Zwork101 commented 5 years ago

What's the issue Trio, after a long period of time, will stop working and raise an internal error.

Steps to reproduce

  1. My code can be found here, you'll need a discord bot token to run it which can be created with a discord account here
  2. Set a the environment variable 'TOKEN' to it the bot's token, and run the examples/test.py file.
  3. Wait a multitude of hours, I'd say this took around 16-18 hours.
  4. Traceback is raised

Traceback (It's a long one)

GOT: {'t': 'PRESENCE_UPDATE', 's': 25582, 'op': 0, 'd': {'us'....}
Sent: Opcodes.Heartbeat 25582
(Above result is from my program)
Traceback (most recent call last):
  File "C:\Users\User\PycharmProjects\pycord\venv\lib\site-packages\trio\_core\_run.py", line 1323, in run
    run_impl(runner, async_fn, args)
  File "C:\Users\User\PycharmProjects\pycord\venv\lib\site-packages\trio\_core\_run.py", line 1471, in run_impl
    runner.task_exited(task, final_outcome)
  File "C:\Users\User\PycharmProjects\pycord\venv\lib\site-packages\trio\_core\_run.py", line 943, in task_exited
    task._cancel_stack[-1]._remove_task(task)
  File "C:\Users\User\PycharmProjects\pycord\venv\lib\site-packages\trio\_core\_run.py", line 202, in _remove_task
    self._tasks.remove(task)
  File "C:\Users\User\AppData\Local\Programs\Python\Python37-32\lib\contextlib.py", line 119, in __exit__
    next(self.gen)
  File "C:\Users\User\PycharmProjects\pycord\venv\lib\site-packages\trio\_core\_run.py", line 156, in _might_change_effective_deadline
    del runner.deadlines[old, id(self)]
  File "C:\Users\User\PycharmProjects\pycord\venv\lib\site-packages\sortedcontainers\sorteddict.py", line 259, in __delitem__
    self._dict_delitem(key)
KeyError: (-inf, 60341456)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:/Users/User/PycharmProjects/pycord/pycord/examples/test.py", line 7, in <module>
    client.run()
  File "C:\Users\User\PycharmProjects\pycord\pycord\pycord\client\client.py", line 70, in run
    self.gateway.start()
  File "C:\Users\User\PycharmProjects\pycord\pycord\pycord\gateway\gate.py", line 351, in start
    trio.run(self._start)
  File "C:\Users\User\PycharmProjects\pycord\venv\lib\site-packages\trio\_core\_run.py", line 1329, in run
    ) from exc
trio.TrioInternalError: internal error in trio - please file a bug!
Exception ignored in: <function Nursery.__del__ at 0x0306DC48>
Traceback (most recent call last):
  File "C:\Users\User\PycharmProjects\pycord\venv\lib\site-packages\trio\_core\_run.py", line 529, in __del__
AssertionError: 
Exception ignored in: <coroutine object TrioGateway.run at 0x033E96F0>
RuntimeError: coroutine ignored GeneratorExit
Exception ignored in: <function Nursery.__del__ at 0x0306DC48>
Traceback (most recent call last):
  File "C:\Users\User\PycharmProjects\pycord\venv\lib\site-packages\trio\_core\_run.py", line 529, in __del__
AssertionError: 
Exception ignored in: <coroutine object WebSocketConnection._reader_task at 0x039532B8>
Traceback (most recent call last):
  File "C:\Users\User\PycharmProjects\pycord\venv\lib\site-packages\trio_websocket\_impl.py", line 928, in _reader_task
  File "C:\Users\User\PycharmProjects\pycord\venv\lib\site-packages\trio\_ssl.py", line 646, in receive_some
  File "C:\Users\User\PycharmProjects\pycord\venv\lib\site-packages\trio\_ssl.py", line 560, in _retry
  File "C:\Users\User\PycharmProjects\pycord\venv\lib\site-packages\trio\_sync.py", line 98, in __aexit__
  File "C:\Users\User\PycharmProjects\pycord\venv\lib\site-packages\trio\_core\_ki.py", line 165, in wrapper
  File "C:\Users\User\PycharmProjects\pycord\venv\lib\site-packages\trio\_sync.py", line 585, in release
  File "C:\Users\User\PycharmProjects\pycord\venv\lib\site-packages\trio\_core\_run.py", line 1534, in current_task
RuntimeError: must be called from async context
Exception ignored in: <function Nursery.__del__ at 0x0306DC48>
Traceback (most recent call last):
  File "C:\Users\User\PycharmProjects\pycord\venv\lib\site-packages\trio\_core\_run.py", line 529, in __del__
AssertionError: 
Exception ignored in: <function AsyncGenerator.__del__ at 0x00EFDF18>
Traceback (most recent call last):
  File "C:\Users\User\PycharmProjects\pycord\venv\lib\site-packages\async_generator\_impl.py", line 415, in __del__
  File "C:\Users\User\PycharmProjects\pycord\venv\lib\site-packages\async_generator\_impl.py", line 363, in step
ValueError: async generator already executing

Expected Result The program should continue running, displaying the output such as the first 2 lines provided in the traceback.

njsmith commented 5 years ago

😕 yeah that's not supposed to happen. Let's see if we can track it down.

Fortunately the traceback contains a lot of information. The relevant part is this:

  File "C:\Users\User\PycharmProjects\pycord\venv\lib\site-packages\trio\_core\_run.py", line 1471, in run_impl
    runner.task_exited(task, final_outcome)
  File "C:\Users\User\PycharmProjects\pycord\venv\lib\site-packages\trio\_core\_run.py", line 943, in task_exited
    task._cancel_stack[-1]._remove_task(task)
  File "C:\Users\User\PycharmProjects\pycord\venv\lib\site-packages\trio\_core\_run.py", line 202, in _remove_task
    self._tasks.remove(task)
  File "C:\Users\User\AppData\Local\Programs\Python\Python37-32\lib\contextlib.py", line 119, in __exit__
    next(self.gen)
  File "C:\Users\User\PycharmProjects\pycord\venv\lib\site-packages\trio\_core\_run.py", line 156, in _might_change_effective_deadline
    del runner.deadlines[old, id(self)]
  File "C:\Users\User\PycharmProjects\pycord\venv\lib\site-packages\sortedcontainers\sorteddict.py", line 259, in __delitem__
    self._dict_delitem(key)
KeyError: (-inf, 60341456)

I think what it's saying is:

So, the first questions that come to mind are:

oremanj commented 5 years ago

I downloaded your code and trio_websocket and looked for red flags. The only suspicious thing I found was that your code appears to create a websocket connection in one call to trio.run() and try to close it in another call:

    async def run(self):
        async with trio_websocket.open_websocket_url(self.gateway_url()) as conn:
            self._conn = conn
            ...

    async def _start(self):
        async with trio.open_nursery() as nursery:
            nursery.start_soon(self.run)
            nursery.start_soon(self.heartbeat)
            print("Started Nursery")

    def start(self):
        trio.run(self._start)

    async def _close(self):
        if self._conn is None:
            raise GatewayError("You tried to close the gateway connection before it was established.")
        await self._conn.aclose(1001)
        self._closed = True

    def close(self):
        trio.run(self._close)

In general, different calls to trio.run() are in different universes and it won't work to share state between them. I wasn't able to reconstruct a story for how you could get this specific error by doing that, but it's a thing to look out for.

Also from your code it looked like maybe it was possible to call close() from inside the start() loop? i.e., a run() inside a run(). That won't work either, but trio.run() is supposed to have a check for that which provides a useful error. Unless there are threads (real OS threads, not trio tasks) somewhere that I'm missing -- each thread can have its own run(), and trying to use the same websocket connection from multiple threads simultaneously could explain the behavior you're seeing.

Thanks for the bug report, and sorry not to be able to provide a more definitive answer!

Zwork101 commented 5 years ago

Thanks! I might be doing a lot of things wrong with trio I don't know about. To answer some questions:

  1. I don't think I ever cancle any calls at all, and definitely not with -inf.
  2. The only nursery created is in an async function, however that's about it.
  3. I'm not using any threads at all.

I think I ran it using PyCharms's debugger. Could that have caused issues?

Fuyukai commented 5 years ago

You've pinned a very old version of trio_websocket in your requirements. Update it and see if the issue stays?

Zwork101 commented 5 years ago

Will do, didn't realize it was old.

njsmith commented 5 years ago

Oh, nice catch. But I don't see any mention of inf or yield or current_effective_deadline in the old trio-websocket 0.2.0 either, so still no smoking gun...

oremanj commented 5 years ago

Closing since I don't see how we could realistically make progress on this with the information we have, and the multiple calls to trio.run() create lots of opportunity for some unexpected and unsupported interaction. Feel free to reopen if it reoccurs.

njsmith commented 5 years ago

(To be clear in case anyone stumbles across this from a search engine years from now: Doing multiple calls to trio.run is fine and totally supported. What's not supported is creating Trio objects like network connections, events, locks, etc. inside one call to trio.run, and then re-using the same objects in a different call to trio.run.)