python / cpython

The Python programming language
https://www.python.org
Other
63.42k stars 30.37k forks source link

Reduce critical section boilerplate in type slot implementations #114258

Open colesbury opened 9 months ago

colesbury commented 9 months ago

Feature or enhancement

PEP 703 in large part relies on replacing the GIL with fine grained per-object locks. The primary way to acquire and release these locks is through the critical section API (https://github.com/python/cpython/issues/111569). In #111903, we added support for generating the necessary critical section calls in Argument Clinic wrapped methods. In #112205, this was extended to getters and setters as well.

However, some object behavior is implemented using "tp slots", which are not supported by Argument Clinic. These functions often need critical sections for thread-safety; that means more boilerplate code.

Here are some possible ideas on how to address this:

Add support for "tp slots" to Argument Clinic

We could add support for certain tp slots functions to Argument Clinic. This would provide a consistent way to add critical sections across both slots, getters/setters, and "regular" methods. The disadvantage is that "tp slots" don't need much argument parsing, so they may otherwise not be a great fit for Argument Clinic

Use C macros to wrap slot implementations with critical sections

We could implement an internal-only header that provides macros to wrap a given "tp slot" function with a critical section. For example, something like:

static PyObject *
list_richcompare_impl(PyObject *v, PyObject *w, int op)
{
   // implement richcompare unchanged
}

// generates a list_richcompare() function with the correct Py_BEGIN/END_CRITICAL_SECTION()
_Py_CRITICAL_SECTION_TP_RICHCOMPARE(list_richcompare, list_richcompare_impl);

I think this would not be too difficult to implement -- probably easier than modifying Argument Clinic. The disadvantage is the lack of uniformity around how we add critical sections to functions.

Keep the status quo

We can always just keep adding the Py_BEGIN/END_CRITICAL_SECTION() calls manually for tp slots.

cc @erlend-aasland, @corona10

Linked PRs

erlend-aasland commented 9 months ago

The disadvantage is that "tp slots" don't need much argument parsing, so they may otherwise not be a great fit for Argument Clinic

OTOH, we may want to expand Argument Clinic's feature set in the future^1, and we already added stuff that were not a great fit[^3]: getters and setters.

[^3]: wrt. the original goals of Argument Clinic, that is

corona10 commented 9 months ago

I would like to suggest delaying the decision until we notice how many cases will be affected to be thread-safe.

The following scenario can be imagined from now on, A. If every tp_xxx should be thread-safe (AFAIK, We don't want to declare crirical_sections for all tp_compare): Declare critical_section implicitly before calling tp_richcompare https://github.com/python/cpython/blob/a34e4db28a98904f6c9976675ed7121ed61edabe/Objects/object.c#L902 B. Not all of them, some of them: Provide wrapping macros that can be easily used.

_Py_CRITICAL_SECTION_TP_RICHCOMPARE_IMPL(list_richcompare_impl)
// ...
.tp_richcompare = _Py_CRITICAL_SECTION_TP_RICHCOMPARE(list_richcompare_impl)

C. Very few of them: Use our hands.. :)

erlend-aasland commented 9 months ago

I would like to suggest delaying the decision until we notice how many cases will be affected to be thread-safe.

I'd expect (correct me if I'm wrong) most GC related slots to be critical sections: clear, traverse, and free. With more than a hundred heap types in Modules/, that means more than three hundred cases already. I also expect (correct me if I'm wrong) can throw in rich-compare, iter, next, repr. ISTM the best way forward is through a carefully designed feature set in Argument Clinic. That will make sure the code base is more maintainable. The consistency will make it easier to reason about the code. It will also be easier to make changes to the boiler-plate code.

+1 for Argument Clinic.

corona10 commented 9 months ago

I'd expect (correct me if I'm wrong) most GC related slots to be critical sections: clear, traverse, and free.

@colesbury cc @erlend-aasland There will be the stop the world phase(Objects can not be touched at this phase) in free threading CPython, should we care about it?

colesbury commented 9 months ago

I am mostly thinking about the non-GC related slots, although critical sections on certain tp_finalize slots would make sense. The tp_traverse slot is only called during stop-the-world (no locking necessary). The tp_clear() is only called on unreachable objects and doesn't support resurrection, so I'm inclined not bother with locking there either, but we might end up revisiting that.

I counted about 14 slots on list that probably need locking or some other thread-safety. There's probably about as many on dict and some on set as well. And then there are other classes that provide tp_richcompare or tp_repr and need locking for those.

I think it's worth doing, but I'm not sure how much work it is to get Argument Clinic to support these slots. It would be less overall work if it's implemented sooner, but it's also not the most urgent thing right now, so I'm not sure how we will prioritize it.

erlend-aasland commented 9 months ago

I'll try to cook up a list of needed AC changes.

corona10 commented 9 months ago

For the design, If we decide to support this feature in AC.

The design should be decided first and my rough idea is using the reserved dunder method convention.

erlend-aasland commented 9 months ago

[...] my rough idea is using the reserved dunder method convention.

Currently, the denylist is made up of the Python equivalents; see:

https://github.com/python/cpython/blob/13907968d73b3b602c81e240fb7892a2627974d6/Tools/clinic/clinic.py#L2573-L2642

Since we will be allowing only a subset of this list, I think it would be beneficial to follow that established convention.

As for the feature, I've stated hashing out some refactorings[^1].

[^1]: breaking up state_modulename_name(), breaking up render_function(), breaking up output_templates())...