Closed asottile-sentry closed 1 week ago
cc @methane since you seem to have been poking around in --enable-threads
and termination recently and maybe have an idea what's going on
Try #2654
tried that patch out but unfortunately it doesn't seem to fix it -- still getting hangs unfortunately
if it's helpful -- an strace of the child processes while stuck shows:
FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)
futex(0xb2bd50, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=1576, tv_nsec=257230947}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)
futex(0xb2bd50, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=1576, tv_nsec=262893986}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)
futex(0xb2bd50, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=1576, tv_nsec=268841818}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)
futex(0xb2bd50, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=1576, tv_nsec=274623638}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)
the parent process:
wait4(-1, 0x7ffc745d4098, WNOHANG, NULL) = 0
epoll_wait(10, [], 1, 1000) = 0
getsockopt(3, SOL_TCP, TCP_INFO, "\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0d\0\0\0"..., [104]) = 0
wait4(-1, 0x7ffc745d4098, WNOHANG, NULL) = 0
epoll_wait(10, [], 1, 1000) = 0
getsockopt(3, SOL_TCP, TCP_INFO, "\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0d\0\0\0"..., [104]) = 0
wait4(-1, 0x7ffc745d4098, WNOHANG, NULL) = 0
epoll_wait(10, [], 1, 1000) = 0
getsockopt(3, SOL_TCP, TCP_INFO, "\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0d\0\0\0"..., [104]) = 0
wait4(-1, 0x7ffc745d4098, WNOHANG, NULL) = 0
epoll_wait(10, [], 1, 1000) = 0
getsockopt(3, SOL_TCP, TCP_INFO, "\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0d\0\0\0"..., [104]) = 0
will see if I can figure out more
ah this is definitely more useful info -- here's the stack of the stuck children:
#0 __futex_abstimed_wait_common64 (private=-1862589240, cancel=true,
abstime=0x7ffc90fb2510, op=137, expected=0,
futex_word=0xc68c30 <_PyRuntime+93552>) at ./nptl/futex-internal.c:57
#1 __futex_abstimed_wait_common (cancel=true, private=-1862589240,
abstime=0x7ffc90fb2510, clockid=1913253632, expected=0,
futex_word=0xc68c30 <_PyRuntime+93552>) at ./nptl/futex-internal.c:87
#2 __GI___futex_abstimed_wait_cancelable64 (
futex_word=futex_word@entry=0xc68c30 <_PyRuntime+93552>,
expected=expected@entry=0, clockid=clockid@entry=1,
abstime=abstime@entry=0x7ffc90fb2510, private=private@entry=0)
at ./nptl/futex-internal.c:139
#3 0x000078b1bc893e9b in __pthread_cond_wait_common (abstime=0x7ffc90fb2510,
clockid=1, mutex=0xc68c38 <_PyRuntime+93560>,
cond=0xc68c08 <_PyRuntime+93512>) at ./nptl/pthread_cond_wait.c:503
#4 ___pthread_cond_timedwait64 (cond=cond@entry=0xc68c08 <_PyRuntime+93512>,
mutex=mutex@entry=0xc68c38 <_PyRuntime+93560>,
abstime=abstime@entry=0x7ffc90fb2510) at ./nptl/pthread_cond_wait.c:652
#5 0x0000000000654a1e in PyCOND_TIMEDWAIT (us=<optimized out>,
mut=0xc68c38 <_PyRuntime+93560>, cond=0xc68c08 <_PyRuntime+93512>)
at ../Python/condvar.h:73
#6 take_gil (tstate=tstate@entry=0xcc6108 <_PyRuntime+475720>)
at ../Python/ceval_gil.c:376
#7 0x0000000000654e80 in PyEval_RestoreThread (
tstate=0xcc6108 <_PyRuntime+475720>) at ../Python/ceval_gil.c:708
#8 0x000078b1bbe6046c in uwsgi_python_master_fixup (step=<optimized out>)
at plugins/python/python_plugin.c:1320
#9 0x000078b1bbe14866 in uwsgi_respawn_worker (wid=wid@entry=2)
at core/master_utils.c:757
#10 0x000078b1bbe12a32 in master_loop (argv=0x2489670, environ=<optimized out>)
at core/master.c:1084
#11 0x000078b1bbe54290 in uwsgi_run () at core/uwsgi.c:3305
#12 0x000078b1bbe5f0b7 in pyuwsgi_run (self=<optimized out>,
args=args@entry=(), kwds=kwds@entry=0x0) at plugins/pyuwsgi/pyuwsgi.c:159
#13 0x0000000000534e56 in cfunction_call (
func=func@entry=<built-in method run of module object at remote 0x78b1bbf15b50>, args=args@entry=(), kwargs=kwargs@entry=0x0)
at ../Objects/methodobject.c:537
#14 0x00000000004ce432 in _PyObject_MakeTpCall (
tstate=tstate@entry=0xcc6108 <_PyRuntime+475720>,
callable=callable@entry=<built-in method run of module object at remote 0x78b1bbf15b50>, args=args@entry=0x78b1bcc66088, nargs=<optimized out>,
(edited with debug line numbers against 3.12.4 + 2.0.26 + #2654)
still an issue. I'm going to try and bisect cpython today and see if I can't find what changed and where the bug lies
alrighty -- here's the result of bisection!
92d8bfffbf377e91d8b92666525cb8700bb1d5e8 is the first bad commit
https://github.com/python/cpython/commit/92d8bfffbf377e91d8b92666525cb8700bb1d5e8
created https://github.com/python/cpython/pull/123079 with a potential fix
btw I have a fix for this in #2660 -- if you're using pyuwsgi
from pypi this fix is cherry-picked and available in 2.0.27a1
Fixed in #2670
reproduction:
output:
(and then no additional output) -- the uwsgi processes also become relatively unkillable (not responding to
^C
/^\
,SIGTERM
etc.)the other terminal output:
this seems to not be an issue with python 3.11 or without
pyuwsgi