python / cpython

The Python programming language
https://www.python.org
Other
63.77k stars 30.54k forks source link

[FreeThreading] object_set_class() fails with an assertion error in _PyCriticalSection_AssertHeld() #127316

Closed devdanzin closed 1 day ago

devdanzin commented 4 days ago

Crash report

What happened?

On a free-threaded debug build, even with PYTHON_GIL=1, it's possible to abort the interpreter by calling _DummyThread._after_fork after a __reduce__ call:

import threading

obj = threading._DummyThread()
res = obj.__reduce__()
res = obj._after_fork(1)

Abort message:

python: ./Include/internal/pycore_critical_section.h:222: _PyCriticalSection_AssertHeld: Assertion `cs != NULL && cs->_cs_mutex == mutex' failed.
Aborted (core dumped)

Found using fusil by @vstinner.

CPython versions tested on:

3.13, 3.14, CPython main branch

Operating systems tested on:

Linux

Output from running 'python -VV' on the command line:

Python 3.14.0a2+ experimental free-threading build (heads/main:0af4ec3, Nov 20 2024, 21:48:16) [GCC 13.2.0]

Linked PRs

ZeroIntensity commented 3 days ago

_after_fork resets mutexes because it's supposed to be used... after a fork(). I think this one is definitely a wontfix, but I'll let others weigh in on it.

vstinner commented 3 days ago

@Fidget-Spinner: You worked on obj.__class__ = new_class in Free Threading, you might be interested by this bug.

I wrote a reproducer which doesn't depend on threading:

class Base:
    def __init__(self):
        self.attr = 123
class ClassA(Base):
    pass
class ClassB(Base):
    pass

obj = ClassA()
# it's important to store __getstate__() result in a variable!
obj_dict = object.__getstate__(obj)
obj.__class__ = ClassB

Output:

python: ./Include/internal/pycore_critical_section.h:222: _PyCriticalSection_AssertHeld: Assertion `cs != NULL && cs->_cs_mutex == mutex' failed.

gdb traceback:

(...)
#4  0x00007ffff7cd8e47 in __assert_fail () from /lib64/libc.so.6
#5  0x000000000051a62a in _PyCriticalSection_AssertHeld (mutex=0x20000737d5a) at ./Include/internal/pycore_critical_section.h:222
#6  0x0000000000529db2 in _PyDict_DetachFromObject (mp=0x20000737d50, obj=<ClassA(attr=123) at remote 0x20000542a20>)
    at Objects/dictobject.c:7303
#7  0x0000000000585d83 in object_set_class_world_stopped (self=<ClassA(attr=123) at remote 0x20000542a20>, newto=0x20000a5aa10)
    at Objects/typeobject.c:6799
#8  0x0000000000585ec1 in object_set_class (self=<ClassA(attr=123) at remote 0x20000542a20>, value=<type at remote 0x20000a5aa10>, closure=0x0)
    at Objects/typeobject.c:6844
#9  0x00000000004d5024 in getset_set (self=<getset_descriptor at remote 0x20000010250>, obj=<ClassA(attr=123) at remote 0x20000542a20>, 
    value=<type at remote 0x20000a5aa10>) at Objects/descrobject.c:249
#10 0x00000000005406f6 in _PyObject_GenericSetAttrWithDict (obj=<ClassA(attr=123) at remote 0x20000542a20>, name='__class__', 
    value=<type at remote 0x20000a5aa10>, dict=0x0) at Objects/object.c:1772
#11 0x0000000000540968 in PyObject_GenericSetAttr (obj=<ClassA(attr=123) at remote 0x20000542a20>, name='__class__', 
    value=<type at remote 0x20000a5aa10>) at Objects/object.c:1843
#12 0x000000000053fa6b in PyObject_SetAttr (v=<ClassA(attr=123) at remote 0x20000542a20>, name='__class__', 
    value=<type at remote 0x20000a5aa10>) at Objects/object.c:1409

cc @colesbury

ZeroIntensity commented 3 days ago

I'm doing my best to try and understand this bug. I get that some lock isn't held when it's supposed to, and that it's related to descriptors, but the original report is a result of _after_fork abuse. What's going on with the __getstate__ problem? Is it a result of the same bug, or just the same assertion that happens to be failing?

vstinner commented 3 days ago

but the original report is a result of _after_fork abuse

My reproducer is unrelated to _after_fork().

What's going on with the getstate problem?

Apparently, getting a reference to the object __dict__ is needed to trigger the bug. I'm not sure why.

Is it a result of the same bug, or just the same assertion that happens to be failing?

I believe that it's the same bug in my reproducer and the "after fork" reproducer.