python / cpython

The Python programming language
https://www.python.org
Other
63.35k stars 30.34k forks source link

CPython 3.12 embedded in WeeChat causes segfault on subsequent calls to Py_EndInterpreter #116510

Open trygveaa opened 8 months ago

trygveaa commented 8 months ago

Crash report

What happened?

WeeChat embeds CPython in order to run Python scripts inside WeeChat. It can load multiple scripts and they each get their own interpreter. When a script is loaded Py_NewInterpreter is called, and when it's unloaded Py_EndInterpreter is called.

With CPython 3.12 loading two scripts and then unloading them in the same order causes a segmentation fault. Interestingly, the segmentation fault doesn't happen if the script that was loaded last is unloaded first.

I bisected this and found it was introduced in commit de64e7561680fdc5358001e9488091e75d4174a3. I also noticed that the crash doesn't occur in the main branch, and did another bisect and found it was fixed in commit 7a7bce5a0ab249407e866a1e955d21fa2b0c8506.

This issue seems similar to the one reported in #115649 which is also introduced by the same commit, but that one still crashes on the main branch (commit 735fc2cbbcf875c359021b5b2af7f4c29f4cf66d).

I haven't been able to reproduce this outside of WeeChat unfortunately, but here is a backtrace from the crash with WeeChat, with commit de64e7561680fdc5358001e9488091e75d4174a3 of CPython and commit ec56a1103f47b15a641ff93528fd6f50025dd524 of WeeChat.

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000074b5e5fd2700 in ?? ()
[Current thread is 1 (Thread 0x74b5e7bec940 (LWP 1881931))]
(gdb) bt
#0  0x000074b5e5fd2700 in ?? ()
#1  <signal handler called>
#2  0x000074b5e64f9ddd in _PyGCHead_SET_PREV (prev=<optimized out>, gc=<optimized out>) at ./Include/internal/pycore_gc.h:74
#3  _PyObject_GC_UNTRACK (op=0x74b5dfb746d0) at ./Include/internal/pycore_object.h:228
#4  PyObject_GC_UnTrack (op_raw=op_raw@entry=0x74b5dfb746d0) at Modules/gcmodule.c:2241
#5  0x000074b5e63c430c in module_dealloc (m=0x74b5dfb746d0) at Objects/moduleobject.c:672
#6  0x000074b5e63c393d in Py_DECREF (op=<optimized out>) at ./Include/object.h:681
#7  Py_XDECREF (op=<optimized out>) at ./Include/object.h:777
#8  meth_dealloc (m=0x74b5dfb81210) at Objects/methodobject.c:170
#9  0x000074b5e63b5d00 in Py_DECREF (op=0x74b5dfb81210) at ./Include/object.h:681
#10 Py_XDECREF (op=0x74b5dfb81210) at ./Include/object.h:777
#11 insertdict (interp=0x74b5e0e42010, mp=mp@entry=0x74b5df1d3d40, key=0x74b5dfb7db30, hash=<optimized out>, value=value@entry=0x74b5e6750240 <_Py_NoneStruct>) at Objects/dictobject.c:1304
#12 0x000074b5e63b6107 in _PyDict_SetItem_Take2 (value=0x74b5e6750240 <_Py_NoneStruct>, key=<optimized out>, mp=0x74b5df1d3d40) at Objects/dictobject.c:1854
#13 0x000074b5e63c5684 in _PyModule_ClearDict (d=0x74b5df1d3d40) at Objects/moduleobject.c:619
#14 0x000074b5e63c5a6e in _PyModule_Clear (m=m@entry=0x74b5df1e12b0) at Objects/moduleobject.c:567
#15 0x000074b5e64c884e in finalize_modules_clear_weaklist (verbose=0, weaklist=0x74b5dfbed080, interp=0x74b5e0e42010) at Python/pylifecycle.c:1491
#16 finalize_modules (tstate=tstate@entry=0x74b5e0ea0400) at Python/pylifecycle.c:1574
#17 0x000074b5e64cc476 in Py_EndInterpreter (tstate=0x74b5e0ea0400) at Python/pylifecycle.c:2137
#18 0x000074b5e6a5bce6 in weechat_python_unload (script=0x5d9a7b80ee60) at /home/trygve/dev/weechat/src/plugins/python/weechat-python.c:947
#19 0x000074b5e6a5bea6 in weechat_python_unload_all () at /home/trygve/dev/weechat/src/plugins/python/weechat-python.c:996
#20 0x000074b5e7b81cea in plugin_script_end (weechat_plugin=0x5d9a7b29ad50, plugin_data=0x74b5e6a9e640 <python_data>) at /home/trygve/dev/weechat/src/plugins/plugin-script.c:1841
#21 0x000074b5e6a5d8c4 in weechat_plugin_end (plugin=0x5d9a7b29ad50) at /home/trygve/dev/weechat/src/plugins/python/weechat-python.c:1634
#22 0x00005d9a78ed4c38 in plugin_unload (plugin=0x5d9a7b29ad50) at /home/trygve/dev/weechat/src/plugins/plugin.c:1261
#23 0x00005d9a78ed4d9f in plugin_unload_all () at /home/trygve/dev/weechat/src/plugins/plugin.c:1313
#24 0x00005d9a78ed50f8 in plugin_end () at /home/trygve/dev/weechat/src/plugins/plugin.c:1433
#25 0x00005d9a78e04340 in weechat_end (gui_end_cb=0x5d9a78ecb8ae <gui_main_end>) at /home/trygve/dev/weechat/src/core/weechat.c:709
#26 0x00005d9a78e03163 in main (argc=4, argv=0x7ffdb171c178) at /home/trygve/dev/weechat/src/gui/curses/normal/main.c:45

This was produced by creating these two python scripts:

dummy1.py:

import weechat

if weechat.register("dummy1", "trygveaa", "0.1", "MIT", "Dummy script 1", "", ""):
    weechat.prnt("", "Loaded dummy script 1")

dummy2.py

import weechat

if weechat.register("dummy2", "trygveaa", "0.1", "MIT", "Dummy script 2", "", ""):
    weechat.prnt("", "Loaded dummy script 2")

And then running weechat -t -r '/script load dummy1.py; /script load dummy2.py; /quit'.

Also, here is the issue report for WeeChat: https://github.com/weechat/weechat/issues/2046

Since it's fixed in main it seems there won't be a problem with 3.13, but I wonder if the fix can be backported to 3.12?

CPython versions tested on:

3.12, CPython main branch

Operating systems tested on:

Linux

Output from running 'python -VV' on the command line:

Python 3.12.0a7+ (tags/v3.12.0a7-340-gde64e75616:de64e75616, Mar 8 2024, 19:43:39) [GCC 13.2.1 20230801]

Linked PRs

ericsnowcurrently commented 7 months ago

As to backporting 7a7bce5a0a (gh-113412), it wasn't obvious at the time that it was worth backporting, relative to the complexity of the change. Ultimately, that's a call for the 3.12 release manager, @Yhg1s to make.

CC @nascheme

neo1973 commented 6 months ago

The same thing happens in Kodi in various situations (e.g. https://github.com/xbmc/xbmc/issues/24440 and reports on https://forum.kodi.tv/), the stack trace is basically the same:

ASAN output ``` ==20353==ERROR: AddressSanitizer: SEGV on unknown address 0x7408bf1a18c0 (pc 0x7408e2f72d11 bp 0x7408bf19f120 sp 0x74089a1feec8 T54) ==20353==The signal is caused by a READ memory access. #0 0x7408e2f72d11 in PyObject_GC_UnTrack (/usr/lib/libpython3.12.so.1.0+0x172d11) (BuildId: 89181a30ef36f4bb519b2474a78e3798ad3c2f9a) #1 0x7408e3074f7a (/usr/lib/libpython3.12.so.1.0+0x274f7a) (BuildId: 89181a30ef36f4bb519b2474a78e3798ad3c2f9a) #2 0x7408e2f88436 (/usr/lib/libpython3.12.so.1.0+0x188436) (BuildId: 89181a30ef36f4bb519b2474a78e3798ad3c2f9a) #3 0x7408e2f774e0 (/usr/lib/libpython3.12.so.1.0+0x1774e0) (BuildId: 89181a30ef36f4bb519b2474a78e3798ad3c2f9a) #4 0x7408e2ffaf00 in _PyModule_ClearDict (/usr/lib/libpython3.12.so.1.0+0x1faf00) (BuildId: 89181a30ef36f4bb519b2474a78e3798ad3c2f9a) #5 0x7408e3074b4b (/usr/lib/libpython3.12.so.1.0+0x274b4b) (BuildId: 89181a30ef36f4bb519b2474a78e3798ad3c2f9a) #6 0x7408e30803fd in Py_EndInterpreter (/usr/lib/libpython3.12.so.1.0+0x2803fd) (BuildId: 89181a30ef36f4bb519b2474a78e3798ad3c2f9a) #7 0x57c9f075f66d in CPythonInvoker::onExecutionDone() xbmc/interfaces/python/PythonInvoker.cpp:574:5 #8 0x57c9f74c4303 in CLanguageInvokerThread::OnExit() xbmc/interfaces/generic/LanguageInvokerThread.cpp:122:14 #9 0x57c9f74c4578 in non-virtual thunk to CLanguageInvokerThread::OnExit() xbmc/interfaces/generic/LanguageInvokerThread.cpp #10 0x57c9f3064a43 in CThread::Action() xbmc/threads/Thread.cpp:292:5 #11 0x57c9f3066fb0 in CThread::Create(bool)::$_0::operator()(CThread*, std::promise) const xbmc/threads/Thread.cpp:152:18 #12 0x57c9f3065c36 in void std::__invoke_impl>(std::__invoke_other, CThread::Create(bool)::$_0&&, CThread*&&, std::promise&&) /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/invoke.h:61:14 #13 0x57c9f3065866 in std::__invoke_result>::type std::__invoke>(CThread::Create(bool)::$_0&&, CThread*&&, std::promise&&) /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/invoke.h:96:14 #14 0x57c9f306579f in void std::thread::_Invoker>>::_M_invoke<0ul, 1ul, 2ul>(std::_Index_tuple<0ul, 1ul, 2ul>) /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/std_thread.h:292:13 #15 0x57c9f3065618 in std::thread::_Invoker>>::operator()() /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/std_thread.h:299:11 #16 0x57c9f30651e8 in std::thread::_State_impl>>>::_M_run() /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/std_thread.h:244:13 #17 0x7408e0adcb62 in execute_native_thread_routine /usr/src/debug/gcc/gcc/libstdc++-v3/src/c++11/thread.cc:104:18 #18 0x57c9efcddc56 in asan_thread_start(void*) (/home/mark/Coding/Repos/kodi-git/build_clang_debug_sanitizer/kodi.bin+0xa349c56) (BuildId: 32d6194589667529dde04169b4d13246ec286fba) #19 0x7408e08a9559 (/usr/lib/libc.so.6+0x8b559) (BuildId: 6542915cee3354fbcf2b3ac5542201faec43b5c9) #20 0x7408e0926a5b (/usr/lib/libc.so.6+0x108a5b) (BuildId: 6542915cee3354fbcf2b3ac5542201faec43b5c9) AddressSanitizer can not provide additional info. SUMMARY: AddressSanitizer: SEGV (/usr/lib/libpython3.12.so.1.0+0x172d11) (BuildId: 89181a30ef36f4bb519b2474a78e3798ad3c2f9a) in PyObject_GC_UnTrack Thread T54 created by T0 here: #0 0x57c9efd9d7c8 in pthread_create (/home/mark/Coding/Repos/kodi-git/build_clang_debug_sanitizer/kodi.bin+0xa4097c8) (BuildId: 32d6194589667529dde04169b4d13246ec286fba) #1 0x7408e0adcc49 in __gthread_create /usr/src/debug/gcc/gcc-build/x86_64-pc-linux-gnu/libstdc++-v3/include/x86_64-pc-linux-gnu/bits/gthr-default.h:663:35 #2 0x7408e0adcc49 in std::thread::_M_start_thread(std::unique_ptr>, void (*)()) /usr/src/debug/gcc/gcc/libstdc++-v3/src/c++11/thread.cc:172:37 #3 0x57c9f3061bc2 in CThread::Create(bool) xbmc/threads/Thread.cpp:118:20 #4 0x57c9f74c1d45 in CLanguageInvokerThread::execute(std::__cxx11::basic_string, std::allocator> const&, std::vector, std::allocator>, std::allocator, std::allocator>>> const&) xbmc/interfaces/generic/LanguageInvokerThread.cpp:59:5 #5 0x57c9f74bf0a9 in ILanguageInvoker::Execute(std::__cxx11::basic_string, std::allocator> const&, std::vector, std::allocator>, std::allocator, std::allocator>>> const&) xbmc/interfaces/generic/ILanguageInvoker.cpp:27:10 #6 0x57c9f74d0de4 in CScriptInvocationManager::ExecuteAsync(std::__cxx11::basic_string, std::allocator> const&, std::shared_ptr const&, std::shared_ptr const&, std::vector, std::allocator>, std::allocator, std::allocator>>> const&, bool, int) xbmc/interfaces/generic/ScriptInvocationManager.cpp:288:18 #7 0x57c9f74ce2f2 in CScriptInvocationManager::ExecuteAsync(std::__cxx11::basic_string, std::allocator> const&, std::shared_ptr const&, std::vector, std::allocator>, std::allocator, std::allocator>>> const&, bool, int) xbmc/interfaces/generic/ScriptInvocationManager.cpp:237:10 #8 0x57c9f4b644f0 in ADDON::CServiceAddonManager::Start(std::shared_ptr const&) xbmc/addons/Service.cpp:90:59 #9 0x57c9f4b631c7 in ADDON::CServiceAddonManager::Start() xbmc/addons/Service.cpp:64:7 #10 0x57c9f40976e1 in CApplication::Initialize() xbmc/application/Application.cpp:761:40 #11 0x57c9f320aaea in XBMC_Run xbmc/platform/xbmc.cpp:43:22 #12 0x57c9efdf173f in main xbmc/platform/posix/main.cpp:70:16 #13 0x7408e0843ccf (/usr/lib/libc.so.6+0x25ccf) (BuildId: 6542915cee3354fbcf2b3ac5542201faec43b5c9) ==20353==ABORTING ```
GDB (with debug symbols) ``` #0 __pthread_kill_implementation (threadid=, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44 #1 0x00007408e08ab393 in __pthread_kill_internal (signo=6, threadid=) at pthread_kill.c:78 #2 0x00007408e085a6c8 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 #3 0x00007408e08424b8 in __GI_abort () at abort.c:79 #4 0x000057c9efdd1921 in __sanitizer::Abort() () #5 0x000057c9efdcf492 in __sanitizer::Die() () #6 0x000057c9efdafcf1 in __asan::ScopedInErrorReport::~ScopedInErrorReport() () #7 0x000057c9efdac351 in __asan::ReportDeadlySignal(__sanitizer::SignalContext const&) () #8 0x000057c9efdab199 in __asan::AsanOnDeadlySignal(int, void*, void*) () #9 0x00007408e085a770 in () at /usr/lib/libc.so.6 #10 0x00007408e2f72d11 in _PyGCHead_SET_PREV (prev=, gc=) at ./Include/internal/pycore_gc.h:74 #11 _PyObject_GC_UNTRACK (op=0x7408b8c59170) at ./Include/internal/pycore_object.h:247 #12 PyObject_GC_UnTrack (op_raw=0x7408b8c59170) at Modules/gcmodule.c:2242 #13 0x00007408e3074f7b in module_dealloc (m=0x7408b8c59170) at Objects/moduleobject.c:709 #14 0x00007408e2f88437 in _Py_Dealloc (op=) at Objects/object.c:2625 #15 Py_DECREF (op=) at ./Include/object.h:705 #16 Py_XDECREF (op=) at ./Include/object.h:798 #17 Py_XDECREF (op=) at ./Include/object.h:795 #18 meth_dealloc (m=0x7408b8c59ad0) at Objects/methodobject.c:170 #19 0x00007408e2f774e1 in _Py_Dealloc (op=0x7408b8c59ad0) at Objects/object.c:2625 #20 Py_DECREF (op=0x7408b8c59ad0) at ./Include/object.h:705 #21 Py_XDECREF (op=0x7408b8c59ad0) at ./Include/object.h:798 #22 insertdict (interp=0x7408bf141800, mp=mp@entry=0x7408b17ed0c0, key=key@entry=0x7408b8c53470, hash=hash@entry=-9076975000305021121, value=value@entry=0x7408e33a9de0 <_Py_NoneStruct>) at Objects/dictobject.c:1319 #23 0x00007408e2ffaf01 in _PyDict_SetItem_Take2 (value=0x7408e33a9de0 <_Py_NoneStruct>, key=, mp=) at Objects/dictobject.c:1865 #24 PyDict_SetItem (value=0x7408e33a9de0 <_Py_NoneStruct>, key=, op=) at Objects/dictobject.c:1883 #25 _PyModule_ClearDict (d=0x7408b17ed0c0) at Objects/moduleobject.c:656 #26 0x00007408e3074b4c in finalize_modules_clear_weaklist (verbose=0, weaklist=0x7408b9b6e1c0, interp=0x7408bf141800) at Python/pylifecycle.c:1526 #27 finalize_modules (tstate=tstate@entry=0x7408bf19f120) at Python/pylifecycle.c:1609 #28 0x00007408e30803fe in Py_EndInterpreter (tstate=0x7408bf19f120) at Python/pylifecycle.c:2199 #29 0x000057c9f075f66e in CPythonInvoker::onExecutionDone (this=0x51300026fc80) at xbmc/interfaces/python/PythonInvoker.cpp:574 #30 0x000057c9f74c4304 in CLanguageInvokerThread::OnExit (this=0x517000197210) at xbmc/interfaces/generic/LanguageInvokerThread.cpp:122 #31 0x000057c9f74c4579 in non-virtual thunk to CLanguageInvokerThread::OnExit() () at xbmc/interfaces/generic/LanguageInvokerThread.cpp:124 #32 0x000057c9f3064a44 in CThread::Action (this=0x517000197238) at xbmc/threads/Thread.cpp:292 #33 0x000057c9f3066fb1 in CThread::Create(bool)::$_0::operator()(CThread*, std::promise) const (this=0x504000434ad8, pThread=0x517000197238, promise=...) at xbmc/threads/Thread.cpp:152 #34 0x000057c9f3065c37 in std::__invoke_impl >(std::__invoke_other, CThread::Create(bool)::$_0&&, CThread*&&, std::promise&&) (__f=..., __args=@0x504000434af0: 0x517000197238, __args=...) at /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/invoke.h:61 #35 0x000057c9f3065867 in std::__invoke >(CThread::Create(bool)::$_0&&, CThread*&&, std::promise&&) (__fn=..., __args=@0x504000434af0: 0x517000197238, __args=...) at /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/invoke.h:96 #36 0x000057c9f30657a0 in std::thread::_Invoker > >::_M_invoke<0ul, 1ul, 2ul>(std::_Index_tuple<0ul, 1ul, 2ul>) (this=0x504000434ad8) at /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/std_thread.h:292 #37 0x000057c9f3065619 in std::thread::_Invoker > >::operator()() (this=0x504000434ad8) at /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/std_thread.h:299 #38 0x000057c9f30651e9 in std::thread::_State_impl > > >::_M_run() (this=0x504000434ad0) at /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/std_thread.h:244 #39 0x00007408e0adcb63 in std::execute_native_thread_routine (__p=0x504000434ad0) at /usr/src/debug/gcc/gcc/libstdc++-v3/src/c++11/thread.cc:104 #40 0x000057c9efcddc57 in asan_thread_start(void*) () #41 0x00007408e08a955a in start_thread (arg=) at pthread_create.c:447 #42 0x00007408e0926a5c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 ```

Are there plans to address this in the 3.12 release cycle?

mooninite commented 6 months ago

I tried unsuccessfully backporting 7a7bce5 as threading tests failed. If someone could attach a 3.12 backport patch I could test Kodi.

nascheme commented 6 months ago

I have a backport of the patch mostly done. I can finish it and then you can test. I think backporting this change would be a good idea given that it seems to work without issue in 3.13 and would solve a few problems for users of Python 3.12.

ericsnowcurrently commented 6 months ago

ping @Yhg1s

flashcode commented 4 months ago

Hi, I'm the author of WeeChat, and got many crash reports from users, who are frustrated by this bug. Even if it's not simple to fix, it would be nice to fix as if affects the latest stable version, which is widely used now. Let me know if you need help for the fix, like testing. Thanks!

nascheme commented 4 months ago

I can investigate but I need some direction on how to compile weechat with a specific version of Python. E.g. if I have Python installed with prefix /usr/local/python-3.12.4, how do I build weechat that uses it? I'm not familiar with CMake and my attempts at making it use that Python failed. It uses /usr/bin/python3.11 from my OS.

flashcode commented 4 months ago

@nascheme: you must compile Python with --enable-shared, then run cmake for WeeChat with this command, to install in a custom path:

mkdir build
cd build
PKG_CONFIG_PATH=/usr/local/python-3.12.4/lib/pkgconfig cmake .. -DCMAKE_INSTALL_PREFIX=/path/to/directory
nascheme commented 4 months ago

I managed to get weechat running with the Python plugin today and did some debugging. The problem occurs during the teardown of the Python interpreter (within Py_EndInterpreter). The backtrace for the crash (SEGV) is:

Thread 1 received signal SIGSEGV, Segmentation fault.
_PyObject_GC_UNTRACK (op=0x7f663fd01f30) at ../Include/internal/pycore_object.h:247
247     _PyGCHead_SET_PREV(next, prev);
(rr) bt
#0  _PyObject_GC_UNTRACK (op=0x7f663fd01f30) at ../Include/internal/pycore_object.h:247
#1  PyObject_GC_UnTrack (op_raw=op_raw@entry=0x7f663fd01f30) at ../Modules/gcmodule.c:2242
#2  0x00007f664e7c71fc in module_dealloc (m=0x7f663fd01f30) at ../Objects/moduleobject.c:709
#3  0x00007f664e7c679d in Py_DECREF (op=<optimized out>) at ../Include/object.h:705
#4  Py_XDECREF (op=<optimized out>) at ../Include/object.h:798
#5  meth_dealloc (m=0x7f663fd12b10) at ../Objects/methodobject.c:170
#6  0x00007f664e7b9330 in Py_DECREF (op=0x7f663fd12b10) at ../Include/object.h:705
#7  Py_XDECREF (op=0x7f663fd12b10) at ../Include/object.h:798
#8  insertdict (interp=0x7f663fc4c010, mp=mp@entry=0x7f663fbf7a40, key=0x7f663fd14b30, hash=<optimized out>, 
    value=value@entry=0x7f664eb4c420 <_Py_NoneStruct>) at ../Objects/dictobject.c:1319
#9  0x00007f664e7b9727 in _PyDict_SetItem_Take2 (value=0x7f664eb4c420 <_Py_NoneStruct>, key=<optimized out>, mp=0x7f663fbf7a40)
    at ../Objects/dictobject.c:1865
#10 0x00007f664e7c84a6 in _PyModule_ClearDict (d=0x7f663fbf7a40) at ../Objects/moduleobject.c:656
#11 0x00007f664e7c86ee in _PyModule_Clear (m=m@entry=0x7f663fc06160) at ../Objects/moduleobject.c:604
#12 0x00007f664e8d173e in finalize_modules_clear_weaklist (verbose=0, weaklist=0x7f663fd7c080, interp=0x7f663fc4c010)
    at ../Python/pylifecycle.c:1526
#13 finalize_modules (tstate=tstate@entry=0x7f663fca9930) at ../Python/pylifecycle.c:1609
#14 0x00007f664e8d4f64 in Py_EndInterpreter (tstate=0x7f663fca9930) at ../Python/pylifecycle.c:2201
#15 0x00007f664f4c5b6d in weechat_python_unload () from /opt/weechat/lib/weechat/plugins/python.so
#16 0x00007f664f4c5d26 in weechat_python_unload_all () from /opt/weechat/lib/weechat/plugins/python.so
#17 0x00007f664f4f7fff in plugin_script_end () from /opt/weechat/lib/weechat/plugins/python.so
#18 0x00007f664f4c7708 in weechat_plugin_end () from /opt/weechat/lib/weechat/plugins/python.so
#19 0x0000559b0bb4edf5 in plugin_unload ()
#20 0x0000559b0bb4ef55 in plugin_unload_all ()
#21 0x0000559b0bb4f2ae in plugin_end ()
#22 0x0000559b0ba84221 in weechat_end ()
#23 0x0000559b0ba83117 in main ()

The gc_next and gc_prev pointers on that object (a module method for the "weechat" module) are not valid. Using rr to run the execution backwards, we find when those pointers are modified last:

Thread 1 hit Hardware watchpoint 1: *((uintptr_t *)0x7f663fd01f20)

Old value = 140077134684360
New value = 94124431992592
_PyObject_GC_UNTRACK (op=0x559b0d556720) at ../Include/internal/pycore_object.h:246
246     _PyGCHead_SET_NEXT(prev, next);
(rr) bt
#0  _PyObject_GC_UNTRACK (op=0x559b0d556720) at ../Include/internal/pycore_object.h:246
#1  type_dealloc (type=0x559b0d556720) at ../Objects/typeobject.c:5050
#2  0x00007f664e7dc095 in Py_DECREF (op=<optimized out>) at ../Include/object.h:705
#3  Py_XDECREF (op=<optimized out>) at ../Include/object.h:798
#4  tupledealloc (op=0x7f66434cf340) at ../Objects/tupleobject.c:206
#5  0x00007f664e7e0f85 in Py_DECREF (op=<optimized out>) at ../Include/object.h:705
#6  Py_XDECREF (op=<optimized out>) at ../Include/object.h:798
#7  type_dealloc (type=0x559b0d556e10) at ../Objects/typeobject.c:5060
#8  0x00007f664e9044f8 in Py_DECREF (op=0x559b0d556e10) at ../Include/object.h:705
#9  delete_garbage (old=0x7f663fdaa0c8, collectable=0x7ffdc655fae0, gcstate=0x7f663fdaa080, tstate=0x7f663fe07930)
    at ../Modules/gcmodule.c:1034
#10 gc_collect_main (tstate=0x7f663fe07930, generation=generation@entry=2, n_collected=n_collected@entry=0x0, 
    n_uncollectable=n_uncollectable@entry=0x0, nofail=nofail@entry=1) at ../Modules/gcmodule.c:1303
#11 0x00007f664e904d86 in _PyGC_CollectNoFail (tstate=tstate@entry=0x7f663fe07930) at ../Modules/gcmodule.c:2135
#12 0x00007f664e8d8d1c in interpreter_clear (interp=0x7f663fdaa010, tstate=tstate@entry=0x7f663fe07930)
    at ../Python/pystate.c:895
#13 0x00007f664e8d91fa in _PyInterpreterState_Clear (tstate=tstate@entry=0x7f663fe07930) at ../Python/pystate.c:957
#14 0x00007f664e8d0c1f in finalize_interp_clear (tstate=tstate@entry=0x7f663fe07930) at ../Python/pylifecycle.c:1743
#15 0x00007f664e8d4f75 in Py_EndInterpreter (tstate=0x7f663fe07930) at ../Python/pylifecycle.c:2204
#16 0x00007f664f4c5b6d in weechat_python_unload () from /opt/weechat/lib/weechat/plugins/python.so
#17 0x00007f664f4c5d26 in weechat_python_unload_all () from /opt/weechat/lib/weechat/plugins/python.so
#18 0x00007f664f4f7fff in plugin_script_end () from /opt/weechat/lib/weechat/plugins/python.so
#19 0x00007f664f4c7708 in weechat_plugin_end () from /opt/weechat/lib/weechat/plugins/python.so
#20 0x0000559b0bb4edf5 in plugin_unload ()
#21 0x0000559b0bb4ef55 in plugin_unload_all ()
#22 0x0000559b0bb4f2ae in plugin_end ()
#23 0x0000559b0ba84221 in weechat_end ()
#24 0x0000559b0ba83117 in main ()

The type object being deallocated doesn't seem to be associated with weechat. It is importlib.abc.SourceLoader. I'm still not sure where the true bug lies here but it seems to be associated with tearing down built-in modules that have reference cycles. As a quick-and-dirty work-around, I tried the following change to weechat:

https://gist.github.com/nascheme/e807f03fd15312bbae52595f21ad0957

The idea is to break the reference cycles in the weechat module before Py_EndInterpreter is called. That avoids a lot of complicated module teardown logic and, at least for me, avoids a crash on weechat shutdown.

This is a band-aid and not fixing the real problem. More investigation would be needed to find it. It could be a bug in Python or perhaps a reference counting bug in weechat.

ZeroIntensity commented 1 month ago

3.12 is going to go security-only relatively soon (sometime in the next few months, according to PEP 693), and this is still causing many problems with subinterpreters (namely, things like lists are completely broken under multithreaded isolated interpreters). If we don't backport the fix, 3.12 will remain with this problem forever. Re-pinging @Yhg1s

nascheme commented 1 month ago

I did a lot of debugging on this crash and here's what I found.

It seems this bug can be avoided by setting m_size of the module to 0, indicating it is safe for sub-interpreters. I'm not sure the weechat extension module is actually safe for that.

According to some discussion with Eric Snow, if Py_NewInterpreter is used then it should still be safe to use m_size = -1. In that case, I think the way immortalized and interned strings work is buggy.

ericsnowcurrently commented 1 month ago

FTR, this case of single-phase init modules with m_size = -1 in subinterpreters is documented as not supported. That said, we still try to avoid breaking working code if we can, even when it is technically doing something it shouldn't (at least for minor cases like this).

nascheme commented 1 month ago

Status update for this bug: AFAIK, it requires two PRs to fix. gh-116510 has been merged already. Something like gh-124865 will be needed as well. Once that's merged, I think we can close this bug.

trygveaa commented 1 month ago

@nascheme: Thanks a lot for your debugging and findings! I'll try to check if it's safe to set m_size = 0 in weechat.

In the last comment you linked to this issue twice. Which PRs did you mean to link to?

nascheme commented 1 month ago

Yes, using m_size = 0 would be a good idea if that extension can be loaded multiple times. If required, you could use separate module state but it doesn't look like that's required.

I fixed the links to the PRs. There one to fix the GC object headers and one for the immortalized interned strings. The second one is still waiting on the fix to be merged. According to the Kodi folk, both are need to fix the crash there. For Weechat, I think only the first might be needed but maybe it's only luck that the second bug doesn't cause it to crash.