python / cpython

The Python programming language
https://www.python.org
Other
62.87k stars 30.12k forks source link

GC in free-threaded build has problems with `Py_INCREF()`/`Py_DECREF()` in `tp_traverse` handlers #123241

Open colesbury opened 1 month ago

colesbury commented 1 month ago

Bug report

This came up in the context of nanobind. Nanobind implements (in a test) a traverse function:

int funcwrapper_tp_traverse(PyObject *self, visitproc visit, void *arg) {
    FuncWrapper *w = nb::inst_ptr<FuncWrapper>(self);

    nb::object f = nb::cast(w->f, nb::rv_policy::none);
    Py_VISIT(f.ptr());

    return 0;
};

The nb::object smart pointer is internally reference counted. In other words, the above is roughly equivalent to:

int funcwrapper_tp_traverse(PyObject *self, visitproc visit, void *arg) {
    PyObject *f = self->w->f;
    Py_INCREF(f);
    Py_VISIT(f);
    Py_DECREF(f);
    return 0;
};

This leads to a leak in the free-threaded GC for subtle reasons: when determining resurrected objects, the free-threaded GC uses ob_ref_local to compute the refcount - incoming references, which may be (temporarily) negative. In this case, Py_INCREF() adds 1 to the refcount, but by the time Py_DECREF() is called, the local refcount is -1 which makes the object appear immortal.

There are a number of limitations on the implementations of traverse functions, which are not well documented. For example, it's not safe to allocate, free, track, or untrack Python objects. It's unclear to me whether there are other issues with calling refcounting functions in traverse callbacks.

I think we can make handle_resurrected_objects more robust to this by splitting the first pass over state->unreachable into two passes.

See also: https://github.com/PyO3/pyo3/issues/3165, which was not related to the free-threaded build.

pablogsal commented 1 month ago

There are two things here to address:

I can try to give a go to both of these. For (2) we can potentially "lock" the GC into a state that emits a warning of some sort if the GC heads are modified during traversals or if any GC function that we don't expect it's called. WDYT?

pablogsal commented 1 month ago

I think we can make handle_resurrected_objects more robust to this by splitting the first pass over state->unreachable into two passes.

Can you elaborate on this to double check that we are in the same page?

colesbury commented 1 month ago

The pass over the unreachable (but possibly resurrected) objects computes the incoming references in ob_ref_local. The final value is always >= 0, but it can be temporarily negative. (Well, sort of "negative". It's a uint32_t so it wraps around.) This is related to the problem seen by nanobind because -1 (or UINT32_MAX) looks like "immortal".

https://github.com/python/cpython/blob/297f2e093ec95800ae2184330b8408c875523467/Python/gc_free_threading.c#L1004-L1013

We can change this so that the first pass initializes ob_ref_local to the current refcount (less one because the GC in this case holds a strong reference). The second pass than calls the traverse handlers. In this structure, the traverse handlers never see a ob_ref_local that looks like an immortal object.

I am still not convinced that it's safe (at least in the free-threaded build) to call Py_INCREF/DECREF during GC traversal, but this at least would fix the leak in nanobind. (I'd characterize it as more robust, rather than "safe".)

        ...
        // Subtract one to account for the reference from the worklist.
        op->ob_ref_local += (uint32_t)refcount - 1;
   }

   WORKSTACK_FOR_EACH(&state->unreachable, op) {
        traverseproc traverse = Py_TYPE(op)->tp_traverse;
        (void) traverse(op,
            (visitproc)visit_decref_unreachable,
            NULL);
    }
pablogsal commented 1 month ago

I think for the general issue I would like to:

What are your thoughts?

colesbury commented 1 month ago

Raise or warn if any of the public GC APIs are used while we are calling the traverse functions

To be clear; the problem in this instance was the use of Py_INCREF/Py_DECREF rather than things like PyObject_GC_Track, although those would definitely be problematic in a tp_traverse.

It seems fine to add extra checks in debug builds, but I'd be worried about introduce slow downs in optimized builds for many of these functions. We probably would also need to be careful about how we report warnings and errors. I think creating a Python warning or error object during a tp_traverse handler execution may cause the same sort of problems we're trying to avoid.

I'm also hesitant to rush any fixes. The change to nanobind seems rather straightforward. And @ngoldbaum has run into something that might be related in PyO3 -- I don't fully understand that problem yet.

pablogsal commented 1 month ago

I'm also hesitant to rush any fixes

I share the same concerns so don't worry. In any case all of these proposal would be only on debug builds and is just a way to warn about the problem. I don't think we need to "fix" anything other than the docs to specify the contract. Any other thing is going to add slowdowns or extra complexity in release builds