python / cpython

The Python programming language
https://www.python.org
Other
62.88k stars 30.12k forks source link

test_audit_subinterpreter crashes with tracerefs enabled #125286

Open nascheme opened 5 hours ago

nascheme commented 5 hours ago

Crash report

What happened?

To run test: Programs/_testembed test_audit_subinterpreter.

The program crashes with an assert failure:

_testembed: ../Objects/object.c:196: _PyRefchain_Remove: Assertion `value == REFCHAIN_VALUE' failed.

Stack trace:

#5  0x00007fbd32045eb2 in __GI___assert_fail (
    assertion=assertion@entry=0x55f8c78ecded "value == REFCHAIN_VALUE", 
    file=file@entry=0x55f8c78ecd93 "../Objects/object.c", line=line@entry=196, 
    function=function@entry=0x55f8c78edba0 <__PRETTY_FUNCTION__.13> "_PyRefchain_Remove")
    at ./assert/assert.c:101
#6  0x000055f8c76c9999 in _PyRefchain_Remove (interp=<optimized out>, obj=obj@entry=0x7fbd3183d710)
    at ../Objects/object.c:196
#7  0x000055f8c76cb6c3 in _Py_ForgetReference (op=op@entry=0x7fbd3183d710) at ../Objects/object.c:2548
#8  0x000055f8c76ca91d in _Py_Dealloc (op=op@entry=0x7fbd3183d710) at ../Objects/object.c:2923
#9  0x000055f8c76b2ad4 in Py_DECREF (filename=filename@entry=0x55f8c78b4f72 "../Include/refcount.h", 
    lineno=lineno@entry=476, op=0x7fbd3183d710) at ../Include/refcount.h:367
#10 0x000055f8c76b2af3 in Py_XDECREF (op=<optimized out>) at ../Include/refcount.h:476
#11 0x000055f8c76b2c0f in dictkeys_decref (interp=interp@entry=0x7fbd31f6e020, 
    dk=dk@entry=0x55f8c9b29030, use_qsbr=use_qsbr@entry=false) at ../Objects/dictobject.c:459
#12 0x000055f8c76bca24 in dict_dealloc (self=0x7fbd31765070) at ../Objects/dictobject.c:3187
#13 0x000055f8c76ca923 in _Py_Dealloc (op=op@entry=0x7fbd31765070) at ../Objects/object.c:2925
#14 0x000055f8c76c67ab in Py_DECREF (
    filename=filename@entry=0x55f8c78ec613 "../Objects/moduleobject.c", lineno=lineno@entry=1118, 
    op=0x7fbd31765070) at ../Include/refcount.h:367
#15 0x000055f8c76c6bdb in module_clear (self=0x7fbd31764110) at ../Objects/moduleobject.c:1118
#16 0x000055f8c77e604a in delete_garbage (tstate=tstate@entry=0x7fbd31fa4d18, 
    gcstate=gcstate@entry=0x7fbd31f6fd08, collectable=collectable@entry=0x7ffd93cac500, 
    old=old@entry=0x7fbd31f6fd50) at ../Python/gc.c:1126
#17 0x000055f8c77e62f7 in gc_collect_region (tstate=tstate@entry=0x7fbd31fa4d18, 
    from=from@entry=0x7fbd31f6fd50, to=to@entry=0x7fbd31f6fd50, untrack=untrack@entry=3, 
    stats=stats@entry=0x7ffd93cac5e0) at ../Python/gc.c:1585
#18 0x000055f8c77e6790 in gc_collect_full (tstate=tstate@entry=0x7fbd31fa4d18, 
    stats=stats@entry=0x7ffd93cac5e0) at ../Python/gc.c:1498
#19 0x000055f8c77e7070 in _PyGC_Collect (tstate=0x7fbd31fa4d18, generation=generation@entry=2, 
    reason=reason@entry=_Py_GC_REASON_SHUTDOWN) at ../Python/gc.c:1862
#20 0x000055f8c77e70ce in _PyGC_CollectNoFail (tstate=tstate@entry=0x7fbd31fa4d18)
    at ../Python/gc.c:1903
#21 0x000055f8c7818408 in finalize_modules (tstate=tstate@entry=0x7fbd31fa4d18)
    at ../Python/pylifecycle.c:1716
#22 0x000055f8c781e9f4 in Py_EndInterpreter (tstate=tstate@entry=0x7fbd31fa4d18)
    at ../Python/pylifecycle.c:2409
#23 0x000055f8c781ec3a in finalize_subinterpreters () at ../Python/pylifecycle.c:2484
#24 0x000055f8c781edc3 in _Py_Finalize (runtime=runtime@entry=0x55f8c7b1e000 <_PyRuntime>)
    at ../Python/pylifecycle.c:2088
#25 0x000055f8c781ef34 in Py_Finalize () at ../Python/pylifecycle.c:2216
#26 0x000055f8c75e90ec in test_audit_subinterpreter () at ../Programs/_testembed.c:1431
#27 0x000055f8c75ece41 in main (argc=<optimized out>, argv=0x7ffd93cac888)
    at ../Programs/_testembed.c:2520

The object is a string that has been interned but not immortalized. Likely it is shared between sub-interpreters, using basic single-phase init and is therefore shared between multiple sub-interpreters.

(rr) p *op
$3 = {{ob_refcnt = 0, ob_refcnt_split = {0, 0}}, ob_type = 0x55f8c7b00c40 <PyUnicode_Type>}
(rr) p op
$4 = 'time_ns'

The value of _PyRefchain_IsTraced() is false since it is ref-traced in a different interpreter.

Quick and dirty fix for the crash:

--- a/Objects/object.c
+++ b/Objects/object.c
@@ -2537,6 +2537,13 @@ _Py_ForgetReference(PyObject *op)

     PyInterpreterState *interp = _PyInterpreterState_GET();

+    if (!_PyRefchain_IsTraced(interp, op) &&
+        PyUnicode_Check(op) &&
+        PyUnicode_CHECK_INTERNED(op) &&
+        interp != _PyInterpreterState_Main()) {
+        return;
+    }
+
 #ifdef SLOW_UNREF_CHECK
     if (!_PyRefchain_Get(interp, op)) {
         /* Not found */

CPython versions tested on:

CPython main branch

Operating systems tested on:

Linux

Output from running 'python -VV' on the command line:

No response

nascheme commented 4 hours ago

This bug was introduced by gh-124865.

My analysis of the issue is that the tracerefs debugging logic gets confused when objects are shared between sub-interpreters. E.g. _PyRefchain_Trace() and _Py_ForgetReference() kind of implicitly assume that objects belong to a single interpreter (interp->object_state.refchain is what stores the trace and it's per interpreter). Objects can be shared when the reload_singlephase_extension() m_size == -1 case occurs. That makes the objects used by the extension become shared between all sub-interpreters importing that extension.

I'm not sure why interned strings are the only thing causing trace-refs failures. I would guess that other objects should also cause an issue since _Py_ForgetReference() will fail if they are traced in a different interpreter. It could be worth constructing a test case that triggers that problem (if possible).

nascheme commented 2 hours ago

Indeed, it does seem possible to crash with shared objects other than strings. You just have to initialize in the right way and then finalize in the right order.

https://github.com/nascheme/cpython/tree/test_single_phase_shared