Open trygveaa opened 8 months ago
As to backporting 7a7bce5a0a (gh-113412), it wasn't obvious at the time that it was worth backporting, relative to the complexity of the change. Ultimately, that's a call for the 3.12 release manager, @Yhg1s to make.
CC @nascheme
The same thing happens in Kodi in various situations (e.g. https://github.com/xbmc/xbmc/issues/24440 and reports on https://forum.kodi.tv/), the stack trace is basically the same:
Are there plans to address this in the 3.12 release cycle?
I tried unsuccessfully backporting 7a7bce5 as threading tests failed. If someone could attach a 3.12 backport patch I could test Kodi.
I have a backport of the patch mostly done. I can finish it and then you can test. I think backporting this change would be a good idea given that it seems to work without issue in 3.13 and would solve a few problems for users of Python 3.12.
ping @Yhg1s
Hi, I'm the author of WeeChat, and got many crash reports from users, who are frustrated by this bug. Even if it's not simple to fix, it would be nice to fix as if affects the latest stable version, which is widely used now. Let me know if you need help for the fix, like testing. Thanks!
I can investigate but I need some direction on how to compile weechat with a specific version of Python. E.g. if I have Python installed with prefix /usr/local/python-3.12.4, how do I build weechat that uses it? I'm not familiar with CMake and my attempts at making it use that Python failed. It uses /usr/bin/python3.11 from my OS.
@nascheme: you must compile Python with --enable-shared
, then run cmake for WeeChat with this command, to install in a custom path:
mkdir build
cd build
PKG_CONFIG_PATH=/usr/local/python-3.12.4/lib/pkgconfig cmake .. -DCMAKE_INSTALL_PREFIX=/path/to/directory
I managed to get weechat running with the Python plugin today and did some debugging. The problem occurs during the teardown of the Python interpreter (within Py_EndInterpreter
). The backtrace for the crash (SEGV) is:
Thread 1 received signal SIGSEGV, Segmentation fault.
_PyObject_GC_UNTRACK (op=0x7f663fd01f30) at ../Include/internal/pycore_object.h:247
247 _PyGCHead_SET_PREV(next, prev);
(rr) bt
#0 _PyObject_GC_UNTRACK (op=0x7f663fd01f30) at ../Include/internal/pycore_object.h:247
#1 PyObject_GC_UnTrack (op_raw=op_raw@entry=0x7f663fd01f30) at ../Modules/gcmodule.c:2242
#2 0x00007f664e7c71fc in module_dealloc (m=0x7f663fd01f30) at ../Objects/moduleobject.c:709
#3 0x00007f664e7c679d in Py_DECREF (op=<optimized out>) at ../Include/object.h:705
#4 Py_XDECREF (op=<optimized out>) at ../Include/object.h:798
#5 meth_dealloc (m=0x7f663fd12b10) at ../Objects/methodobject.c:170
#6 0x00007f664e7b9330 in Py_DECREF (op=0x7f663fd12b10) at ../Include/object.h:705
#7 Py_XDECREF (op=0x7f663fd12b10) at ../Include/object.h:798
#8 insertdict (interp=0x7f663fc4c010, mp=mp@entry=0x7f663fbf7a40, key=0x7f663fd14b30, hash=<optimized out>,
value=value@entry=0x7f664eb4c420 <_Py_NoneStruct>) at ../Objects/dictobject.c:1319
#9 0x00007f664e7b9727 in _PyDict_SetItem_Take2 (value=0x7f664eb4c420 <_Py_NoneStruct>, key=<optimized out>, mp=0x7f663fbf7a40)
at ../Objects/dictobject.c:1865
#10 0x00007f664e7c84a6 in _PyModule_ClearDict (d=0x7f663fbf7a40) at ../Objects/moduleobject.c:656
#11 0x00007f664e7c86ee in _PyModule_Clear (m=m@entry=0x7f663fc06160) at ../Objects/moduleobject.c:604
#12 0x00007f664e8d173e in finalize_modules_clear_weaklist (verbose=0, weaklist=0x7f663fd7c080, interp=0x7f663fc4c010)
at ../Python/pylifecycle.c:1526
#13 finalize_modules (tstate=tstate@entry=0x7f663fca9930) at ../Python/pylifecycle.c:1609
#14 0x00007f664e8d4f64 in Py_EndInterpreter (tstate=0x7f663fca9930) at ../Python/pylifecycle.c:2201
#15 0x00007f664f4c5b6d in weechat_python_unload () from /opt/weechat/lib/weechat/plugins/python.so
#16 0x00007f664f4c5d26 in weechat_python_unload_all () from /opt/weechat/lib/weechat/plugins/python.so
#17 0x00007f664f4f7fff in plugin_script_end () from /opt/weechat/lib/weechat/plugins/python.so
#18 0x00007f664f4c7708 in weechat_plugin_end () from /opt/weechat/lib/weechat/plugins/python.so
#19 0x0000559b0bb4edf5 in plugin_unload ()
#20 0x0000559b0bb4ef55 in plugin_unload_all ()
#21 0x0000559b0bb4f2ae in plugin_end ()
#22 0x0000559b0ba84221 in weechat_end ()
#23 0x0000559b0ba83117 in main ()
The gc_next and gc_prev pointers on that object (a module method for the "weechat" module) are not valid. Using rr
to run the execution backwards, we find when those pointers are modified last:
Thread 1 hit Hardware watchpoint 1: *((uintptr_t *)0x7f663fd01f20)
Old value = 140077134684360
New value = 94124431992592
_PyObject_GC_UNTRACK (op=0x559b0d556720) at ../Include/internal/pycore_object.h:246
246 _PyGCHead_SET_NEXT(prev, next);
(rr) bt
#0 _PyObject_GC_UNTRACK (op=0x559b0d556720) at ../Include/internal/pycore_object.h:246
#1 type_dealloc (type=0x559b0d556720) at ../Objects/typeobject.c:5050
#2 0x00007f664e7dc095 in Py_DECREF (op=<optimized out>) at ../Include/object.h:705
#3 Py_XDECREF (op=<optimized out>) at ../Include/object.h:798
#4 tupledealloc (op=0x7f66434cf340) at ../Objects/tupleobject.c:206
#5 0x00007f664e7e0f85 in Py_DECREF (op=<optimized out>) at ../Include/object.h:705
#6 Py_XDECREF (op=<optimized out>) at ../Include/object.h:798
#7 type_dealloc (type=0x559b0d556e10) at ../Objects/typeobject.c:5060
#8 0x00007f664e9044f8 in Py_DECREF (op=0x559b0d556e10) at ../Include/object.h:705
#9 delete_garbage (old=0x7f663fdaa0c8, collectable=0x7ffdc655fae0, gcstate=0x7f663fdaa080, tstate=0x7f663fe07930)
at ../Modules/gcmodule.c:1034
#10 gc_collect_main (tstate=0x7f663fe07930, generation=generation@entry=2, n_collected=n_collected@entry=0x0,
n_uncollectable=n_uncollectable@entry=0x0, nofail=nofail@entry=1) at ../Modules/gcmodule.c:1303
#11 0x00007f664e904d86 in _PyGC_CollectNoFail (tstate=tstate@entry=0x7f663fe07930) at ../Modules/gcmodule.c:2135
#12 0x00007f664e8d8d1c in interpreter_clear (interp=0x7f663fdaa010, tstate=tstate@entry=0x7f663fe07930)
at ../Python/pystate.c:895
#13 0x00007f664e8d91fa in _PyInterpreterState_Clear (tstate=tstate@entry=0x7f663fe07930) at ../Python/pystate.c:957
#14 0x00007f664e8d0c1f in finalize_interp_clear (tstate=tstate@entry=0x7f663fe07930) at ../Python/pylifecycle.c:1743
#15 0x00007f664e8d4f75 in Py_EndInterpreter (tstate=0x7f663fe07930) at ../Python/pylifecycle.c:2204
#16 0x00007f664f4c5b6d in weechat_python_unload () from /opt/weechat/lib/weechat/plugins/python.so
#17 0x00007f664f4c5d26 in weechat_python_unload_all () from /opt/weechat/lib/weechat/plugins/python.so
#18 0x00007f664f4f7fff in plugin_script_end () from /opt/weechat/lib/weechat/plugins/python.so
#19 0x00007f664f4c7708 in weechat_plugin_end () from /opt/weechat/lib/weechat/plugins/python.so
#20 0x0000559b0bb4edf5 in plugin_unload ()
#21 0x0000559b0bb4ef55 in plugin_unload_all ()
#22 0x0000559b0bb4f2ae in plugin_end ()
#23 0x0000559b0ba84221 in weechat_end ()
#24 0x0000559b0ba83117 in main ()
The type object being deallocated doesn't seem to be associated with weechat. It is importlib.abc.SourceLoader
. I'm still not sure where the true bug lies here but it seems to be associated with tearing down built-in modules that have reference cycles. As a quick-and-dirty work-around, I tried the following change to weechat:
https://gist.github.com/nascheme/e807f03fd15312bbae52595f21ad0957
The idea is to break the reference cycles in the weechat
module before Py_EndInterpreter
is called. That avoids a lot of complicated module teardown logic and, at least for me, avoids a crash on weechat shutdown.
This is a band-aid and not fixing the real problem. More investigation would be needed to find it. It could be a bug in Python or perhaps a reference counting bug in weechat.
3.12 is going to go security-only relatively soon (sometime in the next few months, according to PEP 693), and this is still causing many problems with subinterpreters (namely, things like lists are completely broken under multithreaded isolated interpreters). If we don't backport the fix, 3.12 will remain with this problem forever. Re-pinging @Yhg1s
I did a lot of debugging on this crash and here's what I found.
Py_NewInterpreter
. Let's call them I1 and I2, with I1 created first.weechat
Python extension module is involved in the problem. It's defined as not sub-interpreter safe (m_size set to -1, single-phase init). EXTENSIONS
dict by import_find_extension()
. The module dict (created by I1) is copied into the newly created module (PyDict_Update(mdict, m_copy)
inside import_find_extension()
).prnt
.Py_EndInterpreter
is called for I1, it calls clear_interned_dict
, freeing interned immortal strings (these are in a per-interpreter dict)Py_EndIntepreter
is called for the I2, some strings that were shared from I1 are hashed and the hash crashes. The object was freed as part of the free process of I1It seems this bug can be avoided by setting m_size
of the module to 0, indicating it is safe for sub-interpreters. I'm not sure the weechat
extension module is actually safe for that.
According to some discussion with Eric Snow, if Py_NewInterpreter
is used then it should still be safe to use m_size = -1
. In that case, I think the way immortalized and interned strings work is buggy.
FTR, this case of single-phase init modules with m_size = -1
in subinterpreters is documented as not supported. That said, we still try to avoid breaking working code if we can, even when it is technically doing something it shouldn't (at least for minor cases like this).
Status update for this bug: AFAIK, it requires two PRs to fix. gh-116510 has been merged already. Something like gh-124865 will be needed as well. Once that's merged, I think we can close this bug.
@nascheme: Thanks a lot for your debugging and findings! I'll try to check if it's safe to set m_size = 0
in weechat.
In the last comment you linked to this issue twice. Which PRs did you mean to link to?
Yes, using m_size = 0
would be a good idea if that extension can be loaded multiple times. If required, you could use separate module state but it doesn't look like that's required.
I fixed the links to the PRs. There one to fix the GC object headers and one for the immortalized interned strings. The second one is still waiting on the fix to be merged. According to the Kodi folk, both are need to fix the crash there. For Weechat, I think only the first might be needed but maybe it's only luck that the second bug doesn't cause it to crash.
Crash report
What happened?
WeeChat embeds CPython in order to run Python scripts inside WeeChat. It can load multiple scripts and they each get their own interpreter. When a script is loaded
Py_NewInterpreter
is called, and when it's unloadedPy_EndInterpreter
is called.With CPython 3.12 loading two scripts and then unloading them in the same order causes a segmentation fault. Interestingly, the segmentation fault doesn't happen if the script that was loaded last is unloaded first.
I bisected this and found it was introduced in commit de64e7561680fdc5358001e9488091e75d4174a3. I also noticed that the crash doesn't occur in the main branch, and did another bisect and found it was fixed in commit 7a7bce5a0ab249407e866a1e955d21fa2b0c8506.
This issue seems similar to the one reported in #115649 which is also introduced by the same commit, but that one still crashes on the main branch (commit 735fc2cbbcf875c359021b5b2af7f4c29f4cf66d).
I haven't been able to reproduce this outside of WeeChat unfortunately, but here is a backtrace from the crash with WeeChat, with commit de64e7561680fdc5358001e9488091e75d4174a3 of CPython and commit ec56a1103f47b15a641ff93528fd6f50025dd524 of WeeChat.
This was produced by creating these two python scripts:
dummy1.py
:dummy2.py
And then running
weechat -t -r '/script load dummy1.py; /script load dummy2.py; /quit'
.Also, here is the issue report for WeeChat: https://github.com/weechat/weechat/issues/2046
Since it's fixed in main it seems there won't be a problem with 3.13, but I wonder if the fix can be backported to 3.12?
CPython versions tested on:
3.12, CPython main branch
Operating systems tested on:
Linux
Output from running 'python -VV' on the command line:
Python 3.12.0a7+ (tags/v3.12.0a7-340-gde64e75616:de64e75616, Mar 8 2024, 19:43:39) [GCC 13.2.1 20230801]
Linked PRs