python / cpython

The Python programming language
https://www.python.org
Other
62.15k stars 29.86k forks source link

`_Py_SetImmortal` must be run on allocating thread (no-gil) #113956

Open mjp41 opened 7 months ago

mjp41 commented 7 months ago

Bug report

Bug description:

The code in _Py_SetImmortal must be run on the thread that originally allocated the object. If this is not the case, then a race between Py_INCREF can cause the _Py_SetImmortal to be ignored.

https://github.com/python/cpython/blob/2e7577b622616ef5ff2d1460fb5bebf845f0edf3/Include/internal/pycore_object.h#L128-L139

Example sequence:

  1. Allocating thread begins an _Py_Incref.
  2. Allocating thread detects it should perform inc ref on ob_ref_local, and calculates new value.
  3. Non-allocating thread runs all of _Py_SetImmortal.
  4. Allocating thread completes incref with assignment to ob_ref_local, and unsets the value from immortal.

In the nogil fork, there is some defensive code on other operations that need to be run on the allocating thread, e.g.

_PyObject_SetDeferredRefcount(PyObject *op)
{
    assert(_Py_ThreadLocal(op) && "non thread-safe");

https://github.com/colesbury/nogil-3.12/blob/cedde4f5ec3759ad723c89d44738776f362df564/Include/internal/pycore_object.h#L361C19-L361C19

I think something like this should occur inside _Py_SetImmortal.

It looks like _Py_SetImmortal is not exposed outside the runtime. However, PEP 683, says:

Then setting any object’s refcount to _Py_IMMORTAL_REFCNT makes it immortal.

This implies there is an intent to expose it, and with the changes to nogil. It seems that this should be exposed as an explicit operation, rather than by setting the ref count directly.

CC @ericsnowcurrently @colesbury

P.S. I am happy to PR any required changes.

CPython versions tested on:

CPython main branch

Operating systems tested on:

No response

colesbury commented 7 months ago

A few thoughts:

colesbury commented 7 months ago

I would be hesitant to add an assertion to _Py_SetImmortal() because we call it from a number of places where we already know we are in a bad state, like none_dealloc.

The only call I've seen that concerns me is in _PyUnicode_InternInPlace(). In the free-threaded builds, we should probably only immortalize the string if it is owned by the current thread.

mjp41 commented 7 months ago

@colesbury thanks for the reply. This makes sense. I had thought this is more externally reachable.

I think you could make it concurrency safe by borrowing a third bit from ob_ref_shared to represent Immortal.

colesbury commented 7 months ago

I think you could make it concurrency safe by borrowing a third bit from ob_ref_shared to represent Immortal.

Yeah, that would address concurrency safety, but using ob_ref_local serves two other purposes:

1) It let's us make ob_ref_local 32-bits without having to worry about overflow, because the immortal value is UINT32_MAX. 2) It makes the common cases for Py_INCREF and Py_DECREF faster because we only need to access ob_tid and ob_ref_local.