python / cpython

The Python programming language
https://www.python.org
Other
62.43k stars 29.97k forks source link

Deadlock in Py_EndInterpreter following bpo-1596321: Fix threading._shutdown() for the main thread #122517

Open jdoc-sag opened 1 month ago

jdoc-sag commented 1 month ago

Bug report

Bug description:

I have encountered a deadlock during subinterpreter shutdown after upgrading from Python 3.9.7 to 3.9.8. Git bisection reveals that #28589 ("Fix threading._shutdown() for the main thread (GH-28549)") is the culprit.

This is the gdp stacktrace for the only thread in Python code at the time of the deadlock:

Thread 1 (Thread 0x7f85dfecdfc0 (LWP 2305996)):
#0  0x00007f85d8489dd6 in do_futex_wait.constprop () from /lib64/libpthread.so.0
#1  0x00007f85d8489ec8 in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0
#2  0x00007f85d563343a in PyThread_acquire_lock_timed (lock=lock@entry=0x7f8590131620, microseconds=microseconds@entry=-1000000, 
    intr_flag=intr_flag@entry=1) at Python/thread_pthread.h:483
#3  0x00007f85d568b20d in acquire_timed (lock=0x7f8590131620, timeout=-1000000000) at ./Modules/_threadmodule.c:63
#4  0x00007f85d568b3ca in lock_PyThread_acquire_lock (self=0x7f85bd724c90, args=<optimized out>, kwds=<optimized out>)
    at ./Modules/_threadmodule.c:146
#5  0x00007f85d54fc830 in method_vectorcall_VARARGS_KEYWORDS (func=0x7f85d51bb590, args=0x7f85bd774ef8, nargsf=<optimized out>, 
    kwnames=<optimized out>) at Objects/descrobject.c:348
#6  0x00007f85d54a38ca in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x7f85bd774ef8, callable=0x7f85d51bb590, 
    tstate=0x7f85900b5bd0) at ./Include/cpython/abstract.h:118
#7  PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7f85bd774ef8, callable=0x7f85d51bb590)
    at ./Include/cpython/abstract.h:127
#8  call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x7f85900b5bd0) at Python/ceval.c:5077
#9  _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3506
#10 0x00007f85d54a0b4b in _PyEval_EvalFrame (throwflag=0, f=0x7f85bd774d60, tstate=0x7f85900b5bd0) at ./Include/internal/pycore_ceval.h:40
#11 function_code_fastcall (tstate=0x7f85900b5bd0, co=<optimized out>, args=<optimized out>, nargs=0, globals=<optimized out>)
    at Objects/call.c:330
#12 0x00007f85d54f36c8 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x7ffd2f7880e0, callable=0x7f85bd7300d0, 
    tstate=0x7f85900b5bd0) at ./Include/cpython/abstract.h:118
#13 PyObject_VectorcallMethod (name=<optimized out>, args=0x7ffd2f7880e0, args@entry=0x7ffd2f7880d8, nargsf=<optimized out>, 
    nargsf@entry=9223372036854775809, kwnames=kwnames@entry=0x0) at Objects/call.c:828
#14 0x00007f85d561e6ec in _PyObject_VectorcallMethodId (nargsf=9223372036854775809, kwnames=0x0, args=0x7ffd2f7880d8, 
    name=0x7f85d59b9820 <PyId__shutdown.17464>) at ./Include/cpython/abstract.h:237
#15 _PyObject_CallMethodIdNoArgs (name=0x7f85d59b9820 <PyId__shutdown.17464>, self=<optimized out>) at ./Include/cpython/abstract.h:243
#16 wait_for_thread_shutdown (tstate=0x7f85900b5bd0) at Python/pylifecycle.c:2395
#17 0x00007f85d562107b in Py_EndInterpreter (tstate=0x7f85900b5bd0) at Python/pylifecycle.c:1659
#18 0x0000000000b55b30 in py::interpreter::~interpreter() ()
#19 0x00000000010206ee in CorrelatorPythonPluginType::unload() ()
#20 0x0000000000ca4700 in NameStore::shutdown() ()
#21 0x0000000000b293d8 in CorrelatorImplementation::release() ()
#22 0x00000000009dd6e7 in com::apama::correlator::ServerCorrelator::run(int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, char const*, bool, double, char**, int, com::apama::correlator::CorrelatorConfig&) ()
#23 0x00000000009accb4 in ap_main(int, char**) ()
#24 0x00007f85d80dd7e5 in __libc_start_main () from /lib64/libc.so.6
#25 0x00000000009a6fee in _start ()

I believe this could be the same issue described by GrahamDumpleton/mod_wsgi#730 and this commit is reverted in each version of Fedora Python e.g. for 3.12: fedora-python/cpython@4b35a8e. However, I have not been able to find an existing cpython report for the problem.

CPython versions tested on:

3.9, 3.10

Operating systems tested on:

Linux

picnixz commented 1 month ago

(It appears it affects Python 3.12 for Fedora, so I also put 3.11 and 3.12 labels)

picnixz commented 1 month ago

cc @vstinner as the author of the fix (also I'm not sure that it only affects subinterpreters or if its a wider bug)

vstinner commented 1 month ago

If you're using mod_wsgi, you can use "WSGIDestroyInterpreter Off" configuration option to work around this issue.

Fedora reverts https://github.com/python/cpython/pull/28589 ("Fix threading._shutdown() for the main thread (https://github.com/python/cpython/pull/28549)") in Python 3.9 to 3.12. Python 3.13 works again without the revert since thread shutdown was rewritten by the commit 33da0e844c922b3dcded75fbb9b7be67cb013a17. Example:

Python 3.9-3.11 no longer gets bugfixes. I don't think that we should/can change this in Python 3.12 (IMO it's too late). There is a way to work around the issue in mod_wsgi.

(It appears it affects Python 3.12 for Fedora, so I also put 3.11 and 3.12 labels)

Python 3.12 provided by Fedora should not be affected since it has the revert.