official-stockfish / fishtest

The Stockfish testing framework
https://tests.stockfishchess.org/tests
270 stars 126 forks source link

pserve hangs #2055

Open vondele opened 3 weeks ago

vondele commented 3 weeks ago

This is just to document a rare hang that has been observed a few times in the last few weeks with server under significant load, or possibly when that large load just disappears.

Unfortunately, there is little information:

vdbergh commented 3 weeks ago

All four waitress threads seem to be stuck on waiter.acquire(). But perhaps this is a normal condition (it depends on what kind of object waiter is).

This seems a bit scary

File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/pymongo/socket_checker.py", line 66, in select
Jun 06 19:49:06 tests.stockfishchess.org pserve[16916]:     res = self._poller.poll(timeout_)

Polling gone wrong?

vondele commented 3 weeks ago

I will later try to post a trace during normal operation, these are standard as far as I can recall. The acquire seems to be on a condition variable, so that would be fine.

vondele commented 3 weeks ago

current 'healthy' sample (looks identical to my eyes):

-- Logs begin at Mon 2023-11-27 14:51:35 UTC, end at Fri 2024-06-07 13:22:35 UTC. --
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]: ===================  <_MainThread(MainThread, started 139749783471936)>  ======================
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/fishtest/server/env/bin/pserve", line 8, in <module>
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     sys.exit(main())
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/pyramid/scripts/pserve.py", line 30, in main
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     return command.run()
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/pyramid/scripts/pserve.py", line 281, in run
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     server(app)
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/paste/deploy/loadwsgi.py", line 180, in server_wrapper
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     return fix_call(
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/paste/deploy/util.py", line 61, in fix_call
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     val = callable(*args, **kw)
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/waitress/__init__.py", line 23, in serve_paste
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     serve(app, **kw)
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/waitress/__init__.py", line 19, in serve
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     server.run()
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/waitress/server.py", line 325, in run
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     self.asyncore.loop(
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/waitress/wasyncore.py", line 245, in loop
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     poll_fun(timeout, map)
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/waitress/wasyncore.py", line 172, in poll
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     r, w, e = select.select(r, w, e, timeout)
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/fishtest/server/fishtest/__init__.py", line 25, in thread_stack_dump
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     traceback.print_stack(sys._current_frames()[th.ident])
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]: ===================  <Thread(pymongo_server_monitor_thread, started daemon 139749221340928)>  ======================
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1030, in _bootstrap
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     self._bootstrap_inner()
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     self.run()
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1010, in run
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     self._target(*self._args, **self._kwargs)
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/pymongo/periodic_executor.py", line 141, in _run
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     if not self._target():
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/pymongo/monitor.py", line 62, in target
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     monitor._run()  # type:ignore[attr-defined]
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/pymongo/monitor.py", line 192, in _run
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     self._server_description = self._check_server()
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/pymongo/monitor.py", line 235, in _check_server
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     return self._check_once()
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/pymongo/monitor.py", line 282, in _check_once
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     response, round_trip_time = self._check_with_socket(conn)
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/pymongo/monitor.py", line 304, in _check_with_socket
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     response = Hello(conn._next_reply(), awaitable=True)
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/pymongo/pool.py", line 921, in _next_reply
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     reply = self.receive_message(None)
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/pymongo/pool.py", line 1040, in receive_message
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     return receive_message(self, request_id, self.max_message_size)
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/pymongo/network.py", line 317, in receive_message
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     length, _, response_to, op_code = _UNPACK_HEADER(_receive_data_on_socket(conn, 16, deadline))
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/pymongo/network.py", line 394, in _receive_data_on_socket
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     wait_for_read(conn, deadline)
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/pymongo/network.py", line 375, in wait_for_read
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     readable = conn.socket_checker.select(sock, read=True, timeout=timeout)
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/pymongo/socket_checker.py", line 66, in select
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     res = self._poller.poll(timeout_)
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]: ===================  <Thread(pymongo_kill_cursors_thread, started daemon 139749212948224)>  ======================
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1030, in _bootstrap
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     self._bootstrap_inner()
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     self.run()
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1010, in run
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     self._target(*self._args, **self._kwargs)
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/pymongo/periodic_executor.py", line 156, in _run
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     time.sleep(self._min_interval)
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]: ===================  <Thread(pymongo_server_rtt_thread, started daemon 139749202384640)>  ======================
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1030, in _bootstrap
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     self._bootstrap_inner()
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     self.run()
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1010, in run
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     self._target(*self._args, **self._kwargs)
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/pymongo/periodic_executor.py", line 156, in _run
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     time.sleep(self._min_interval)
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]: ===================  <Thread(waitress-0, started daemon 139749188699904)>  ======================
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1030, in _bootstrap
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     self._bootstrap_inner()
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     self.run()
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1010, in run
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     self._target(*self._args, **self._kwargs)
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/waitress/task.py", line 72, in handler_thread
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     self.queue_cv.wait()
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 355, in wait
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     waiter.acquire()
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]: ===================  <Thread(waitress-1, started daemon 139748974786304)>  ======================
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1030, in _bootstrap
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     self._bootstrap_inner()
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     self.run()
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1010, in run
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     self._target(*self._args, **self._kwargs)
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/waitress/task.py", line 72, in handler_thread
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     self.queue_cv.wait()
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 355, in wait
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     waiter.acquire()
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]: ===================  <Thread(waitress-2, started daemon 139748966393600)>  ======================
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1030, in _bootstrap
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     self._bootstrap_inner()
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     self.run()
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1010, in run
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     self._target(*self._args, **self._kwargs)
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/waitress/task.py", line 72, in handler_thread
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     self.queue_cv.wait()
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 355, in wait
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     waiter.acquire()
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]: ===================  <Thread(waitress-3, started daemon 139748958000896)>  ======================
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1030, in _bootstrap
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     self._bootstrap_inner()
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     self.run()
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 1010, in run
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     self._target(*self._args, **self._kwargs)
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/fishtest/server/env/lib/python3.12/site-packages/waitress/task.py", line 72, in handler_thread
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     self.queue_cv.wait()
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:   File "/home/fishtest/.pyenv/versions/3.12.3/lib/python3.12/threading.py", line 355, in wait
Jun 07 13:20:56 tests.stockfishchess.org pserve[26802]:     waiter.acquire()