python / cpython

The Python programming language
https://www.python.org
Other
62.89k stars 30.12k forks source link

Calling _run_once with an empty event loop hangs forever #112741

Closed phvalguima closed 3 months ago

phvalguima commented 10 months ago

Bug report

Bug description:

I am currently using pytest-asyncio + pytest-operator and I have noticed that some tests eventually end up hanging forever.

That is caused because this part of _run_once is being called with self._ready=deque([]), the loop is not marked as self._stopping and there is no items in self._scheduled. If that happens, then the line is called with None: event_list = self._selector.select(timeout=None), and results in self._selector.select hanging.

More specifically, my event loop looks like:

(Pdb) p self
<_UnixSelectorEventLoop running=True closed=False debug=True>
(Pdb) p self.__dict__
{'_timer_cancelled_count': 0, '_closed': False, '_stopping': False, '_ready': deque([]), '_scheduled': [], '_default_executor': None, '_internal_fds': 1, '_thread_id': 139941579980864, '_clock_resolution': 1e-09, '_exception_handler': None, '_debug': True, 'slow_callback_duration': 0.1, '_current_handle': None, '_task_factory': None, '_coroutine_origin_tracking_enabled': True, '_coroutine_origin_tracking_saved_depth': 0, '_asyncgens': set(), '_asyncgens_shutdown_called': False, '_executor_shutdown_called': False, '_selector': <selectors.EpollSelector object at 0x7f46a8c71dc0>, '_ssock': <socket.socket fd=14, family=1, type=1, proto=0>, '_csock': <socket.socket fd=15, family=1, type=1, proto=0>, '_transports': <WeakValueDictionary at 0x7f46a8c71eb0>, '_signal_handlers': {}}

Although this is primarily an issue with the interaction between how pytest-operator handles event loops and pytest-asyncio latest event loop mechanism, it was not trivial for the team to understand the problem as each automated test was hanging in our CI, instead of failing.

It feels like we are missing an exception here, as timeout=None results in epoll_wait called with -1 - i.e. no deadline set and, from _run_once code, timeout=None effectively represents a brand new loop with no tasks queued yet. It should not be possible to fall in this situation. Maybe a RuntimeError:

        timeout = None
        if self._ready or self._stopping:
            timeout = 0
        elif self._scheduled:
            # Compute the desired timeout.
            when = self._scheduled[0]._when
            timeout = min(max(0, when - self.time()), MAXIMUM_SELECT_TIMEOUT)
+        else:
+            raise RuntimeError("Loop is empty and cannot run any tasks.")

It seems that what I am seeing is closely related to #111604 and #78340. I have opened a new bug as #111604 seems to be closer to the internals of Python in Windows. Feel free to close this issue as a duplicate otherwise.

CPython versions tested on:

3.11

Operating systems tested on:

Linux

gvanrossum commented 10 months ago

I think it's exactly like #78340, but I also think (a) the behavior is correct, and (b) _run_once() is an internal API so you have no business calling it -- and if you call it, you have no business complaining about it. :-)

I think it's correct because it's the logical edge case -- it should wait until something happens, and if there are no events in the queue, that means waiting forever.

Moreover, I think that something can still happen -- another thread can add a callback using loop.call_soon_threadsafe().

kumaraditya303 commented 3 months ago

Closing as it is not a bug so won't fix.