python / cpython

The Python programming language
https://www.python.org
Other
63.59k stars 30.46k forks source link

gh-115999: Enable specialization of `CALL` instructions in free-threaded builds #127123

Open mpage opened 16 hours ago

mpage commented 16 hours ago

The CALL family of instructions were mostly thread-safe already and only required a small number of changes, which are documented below.

A few changes were needed to make CALL_ALLOC_AND_ENTER_INIT thread-safe:

A few other miscellaneous changes were also needed:

Single-threaded performance

Scaling

The scaling benchmark looks about the same for this PR vs its base:

                    Base         This PR
object_cfunction    1.5x slower  1.3x slower
cmodule_function    1.5x slower  1.5x slower
mult_constant      12.5x faster  12.2x faster
generator          12.1x faster  12.1x faster
pymethod            1.8x slower  1.9x slower
pyfunction         13.6x faster  14.1x faster
module_function     1.7x slower  2.0x slower
load_string_const  13.1x faster  13.8x faster
load_tuple_const   13.0x faster  13.0x faster
create_pyobject    11.7x faster  14.1x faster
create_closure     13.4x faster  13.4x faster
create_dict        12.7x faster  12.0x faster
thread_local_read   3.6x slower  3.7x slower

Thread safety

Thread safety of each instruction in the CALL family is documented below, starting with the uops that are composed to form instructions in the family.

UOPS

The more interesting uops that warrant closer inspection are:

_CHECK_AND_ALLOCATE_OBJECT This uop loads an __init__ method from the specialization cache of the operand (a type) if the operand's type version matches the type version stored in the inline cache. The loaded method is guaranteed to be valid because we only store deferred objects in the specialization cache and there are no escaping calls following the load:

  1. The type version is cleared before the reference in the MRO to __init__ is destroyed.
  2. If the reference in (1) was the last reference then the __init__ method will be queued for deletion the next time GC runs.
  3. GC requires stopping the world, which forces a synchronizes-with operation between all threads.
  4. If the GC collects the cached __init__, then type's version will have been updated and the update will be visible to all threads, so the guard cannot pass.

_CHECK_FUNCTION_VERSION This uop guards that the top of the stack is a function and that its version matches the version stored in the inline cache. Instructions assume that if the guard passes, the version, and any properties verified by the version, will not change for the remainder of the instruction execution, assuming there are no escaping calls in between the guard and the code that relies on the guard. This property is preserved in free-threaded builds: the world is stopped whenever a function's version changes.

_CHECK_PEP_523 This uop guards that a custom eval frame function is not in use. Instructions assume that if the guard passes, an eval frame function will not be set for the remainder of the instruction's execution, assuming there are no escaping calls in between the guard and code that relies on the guard passing. This property is preserved in free-threaded builds: the world is stopped whenever the eval frame function is set.

The instructions are also composed of uops whose thread safety properties are easier to reason about and require less scrutiny. These are:

Instructions

These instructions perform exact type checks and loads from immutable fields of PyCFunction objects:

These instructions perform exact type checks and loads from immutable fields of PyMethodDescrObjects:

These instructions are composed of the uops documented above, and are thread-safe transitively:

These instructions load from the callable cache, which is immutable, perform exact type checks, and use existing thread-safe APIs:

These instructions use existing thread-safe APIs:

Finally, these instructions don't categorize neatly:

Specialization

Apart from the changes discussed earlier, specialization is already thread-safe. It inspects immutable properties (i.e. those of code objects, method descriptors, or PyCFunctions) or properties that require stopping the world to mutate (i.e. properties checked by function version guards).