The CALL family of instructions were mostly thread-safe already and only required a small number of changes, which are documented below.
A few changes were needed to make CALL_ALLOC_AND_ENTER_INIT thread-safe:
Added _PyType_LookupRefAndVersion, which returns the type version corresponding to the returned ref.
Added _PyType_CacheInitForSpecialization, which takes an init method and the corresponding type version and only populates the specialization cache if the current type version matches the supplied version. This prevents potentially caching a stale value in free-threaded builds if we race with an update to __init__.
Only cache __init__ functions that are deferred in free-threaded builds. This ensures that the reference to __init__ that is stored in the specialization cache is valid if the type version guard in _CHECK_AND_ALLOCATE_OBJECT passes.
A few other miscellaneous changes were also needed:
Added _PyList_AppendTakeRefAndLock for use in LIST_APPEND. This ensures that the list's per-object lock is held while we are appending to it.
Add missing co_tlbc for _Py_InitCleanup.
Stop/start the world around setting the eval frame hook. This allows us to read interp->eval_frame non-atomically and preserves the behavior of _CHECK_PEP_523 documented below.
Single-threaded performance
Performance is improved by 3-4% on free-threaded builds.
Thread safety of each instruction in the CALL family is documented below, starting with the uops that are composed to form instructions in the family.
UOPS
The more interesting uops that warrant closer inspection are:
_CHECK_AND_ALLOCATE_OBJECT
This uop loads an __init__ method from the specialization cache of the operand (a type) if the operand's type version matches the type version stored in the inline cache. The loaded method is guaranteed to be valid because we only store deferred objects in the specialization cache and there are no escaping calls following the load:
The type version is cleared before the reference in the MRO to __init__ is destroyed.
If the reference in (1) was the last reference then the __init__ method will be queued for deletion the next time GC runs.
GC requires stopping the world, which forces a synchronizes-with operation between all threads.
If the GC collects the cached __init__, then type's version will have been updated and the update will be visible to all threads, so the guard cannot pass.
_CHECK_FUNCTION_VERSION
This uop guards that the top of the stack is a function and that its version matches the version stored in the inline cache. Instructions assume that if the guard passes, the version, and any properties verified by the version, will not change for the remainder of the instruction execution, assuming there are no escaping calls in between the guard and the code that relies on the guard. This property is preserved in free-threaded builds: the world is stopped whenever a function's version changes.
_CHECK_PEP_523
This uop guards that a custom eval frame function is not in use. Instructions assume that if the guard passes, an eval frame function will not be set for the remainder of the instruction's execution, assuming there are no escaping calls in between the guard and code that relies on the guard passing. This property is preserved in free-threaded builds: the world is stopped whenever the eval frame function is set.
The instructions are also composed of uops whose thread safety properties are easier to reason about and require less scrutiny. These are:
_CHECK_CALL_BOUND_METHOD_EXACT_ARGS - Only performs exact type checks, which are thread-safe: changing an instance's type stops the world.
_CHECK_FUNCTION_EXACT_ARGS - All the loads in the uop are safe to perform non-atomically: setting func->func_code stops the world, the co_argcount attribute of code objects is immutable.
_CHECK_IS_NOT_PY_CALLABLE - Only performs exact type checks.
_CHECK_METHOD_VERSION - This loads a function from a PyMethodObject and guards that its version matches what is stored in the cache. PyMethodObjects are immutable; their fields can be accessed non-atomically. The thread safety of function version guards was already documented above.
_CHECK_PERIODIC - Thread safety was previously addressed as part of the 3.13 release.
_CHECK_STACK_SPACE - All the loads in this uop are safe to perform non-atomically: setting func->func_code stops the world, the co_framesize attribute of code objects is immutable, and tstate->py_recursion_remaining should only be mutated by the current thread.
_INIT_CALL_BOUND_METHOD_EXACT_ARGS - Only loads from PyMethodObjects.
_INIT_CALL_PY_EXACT_ARGS - Only operates on data that isn't yet visible to other threads.
_PUSH_FRAME - Only manipulates fields that are not read by other threads.
_PY_FRAME_GENERAL - Reads from fields that are either immutable (co_flags) or requires stopping the world to change (func_code).
_SAVE_RETURN_OFFSET - Stores only to the frame's return_offset which is not read by other threads.
Instructions
These instructions perform exact type checks and loads from immutable fields of PyCFunction objects:
CALL_BUILTIN_FAST
CALL_BUILTIN_FAST_WITH_KEYWORDS
CALL_BUILTIN_O
These instructions perform exact type checks and loads from immutable fields of PyMethodDescrObjects:
CALL_METHOD_DESCRIPTOR_FAST
CALL_METHOD_DESCRIPTOR_FAST_WITH_KEYWORDS
CALL_METHOD_DESCRIPTOR_NOARGS
CALL_METHOD_DESCRIPTOR_O
These instructions are composed of the uops documented above, and are thread-safe transitively:
CALL_ALLOC_AND_ENTER_INIT
CALL_BOUND_METHOD_EXACT_ARGS
CALL_BOUND_METHOD_GENERAL
CALL_NON_PY_GENERAL
CALL_PY_EXACT_ARGS
CALL_PY_GENERAL
These instructions load from the callable cache, which is immutable, perform exact type checks, and use existing thread-safe APIs:
CALL_ISINSTANCE
CALL_LEN
CALL_LIST_APPEND
These instructions use existing thread-safe APIs:
CALL_STR_1
CALL_TUPLE_1
CALL_TYPE_1
Finally, these instructions don't categorize neatly:
CALL_BUILTIN_CLASS - Performs exact type checks and loads from immutable types.
Specialization
Apart from the changes discussed earlier, specialization is already thread-safe. It inspects immutable properties (i.e. those of code objects, method descriptors, or PyCFunctions) or properties that require stopping the world to mutate (i.e. properties checked by function version guards).
The
CALL
family of instructions were mostly thread-safe already and only required a small number of changes, which are documented below.A few changes were needed to make
CALL_ALLOC_AND_ENTER_INIT
thread-safe:_PyType_LookupRefAndVersion
, which returns the type version corresponding to the returned ref._PyType_CacheInitForSpecialization
, which takes an init method and the corresponding type version and only populates the specialization cache if the current type version matches the supplied version. This prevents potentially caching a stale value in free-threaded builds if we race with an update to__init__
.__init__
functions that are deferred in free-threaded builds. This ensures that the reference to__init__
that is stored in the specialization cache is valid if the type version guard in_CHECK_AND_ALLOCATE_OBJECT
passes.A few other miscellaneous changes were also needed:
_PyList_AppendTakeRefAndLock
for use inLIST_APPEND
. This ensures that the list's per-object lock is held while we are appending to it.co_tlbc
for_Py_InitCleanup
.interp->eval_frame
non-atomically and preserves the behavior of_CHECK_PEP_523
documented below.Single-threaded performance
Scaling
The scaling benchmark looks about the same for this PR vs its base:
Thread safety
Thread safety of each instruction in the CALL family is documented below, starting with the uops that are composed to form instructions in the family.
UOPS
The more interesting uops that warrant closer inspection are:
_CHECK_AND_ALLOCATE_OBJECT
This uop loads an__init__
method from the specialization cache of the operand (a type) if the operand's type version matches the type version stored in the inline cache. The loaded method is guaranteed to be valid because we only store deferred objects in the specialization cache and there are no escaping calls following the load:__init__
is destroyed.__init__
method will be queued for deletion the next time GC runs.__init__
, then type's version will have been updated and the update will be visible to all threads, so the guard cannot pass._CHECK_FUNCTION_VERSION
This uop guards that the top of the stack is a function and that its version matches the version stored in the inline cache. Instructions assume that if the guard passes, the version, and any properties verified by the version, will not change for the remainder of the instruction execution, assuming there are no escaping calls in between the guard and the code that relies on the guard. This property is preserved in free-threaded builds: the world is stopped whenever a function's version changes._CHECK_PEP_523
This uop guards that a custom eval frame function is not in use. Instructions assume that if the guard passes, an eval frame function will not be set for the remainder of the instruction's execution, assuming there are no escaping calls in between the guard and code that relies on the guard passing. This property is preserved in free-threaded builds: the world is stopped whenever the eval frame function is set.The instructions are also composed of uops whose thread safety properties are easier to reason about and require less scrutiny. These are:
_CALL_NON_PY_GENERAL
- Uses existing thread-safe APIs._CHECK_CALL_BOUND_METHOD_EXACT_ARGS
- Only performs exact type checks, which are thread-safe: changing an instance's type stops the world._CHECK_FUNCTION_EXACT_ARGS
- All the loads in the uop are safe to perform non-atomically: settingfunc->func_code
stops the world, theco_argcount
attribute of code objects is immutable._CHECK_IS_NOT_PY_CALLABLE
- Only performs exact type checks._CHECK_METHOD_VERSION
- This loads a function from aPyMethodObject
and guards that its version matches what is stored in the cache.PyMethodObject
s are immutable; their fields can be accessed non-atomically. The thread safety of function version guards was already documented above._CHECK_PERIODIC
- Thread safety was previously addressed as part of the 3.13 release._CHECK_STACK_SPACE
- All the loads in this uop are safe to perform non-atomically: settingfunc->func_code
stops the world, theco_framesize
attribute of code objects is immutable, andtstate->py_recursion_remaining
should only be mutated by the current thread._CREATE_INIT_FRAME
- Uses existing thread-safe APIs._EXPAND_METHOD
- Only loads fromPyMethodObject
s._INIT_CALL_BOUND_METHOD_EXACT_ARGS
- Only loads fromPyMethodObject
s._INIT_CALL_PY_EXACT_ARGS
- Only operates on data that isn't yet visible to other threads._PUSH_FRAME
- Only manipulates fields that are not read by other threads._PY_FRAME_GENERAL
- Reads from fields that are either immutable (co_flags
) or requires stopping the world to change (func_code
)._SAVE_RETURN_OFFSET
- Stores only to the frame'sreturn_offset
which is not read by other threads.Instructions
These instructions perform exact type checks and loads from immutable fields of
PyCFunction
objects:CALL_BUILTIN_FAST
CALL_BUILTIN_FAST_WITH_KEYWORDS
CALL_BUILTIN_O
These instructions perform exact type checks and loads from immutable fields of
PyMethodDescrObject
s:CALL_METHOD_DESCRIPTOR_FAST
CALL_METHOD_DESCRIPTOR_FAST_WITH_KEYWORDS
CALL_METHOD_DESCRIPTOR_NOARGS
CALL_METHOD_DESCRIPTOR_O
These instructions are composed of the uops documented above, and are thread-safe transitively:
CALL_ALLOC_AND_ENTER_INIT
CALL_BOUND_METHOD_EXACT_ARGS
CALL_BOUND_METHOD_GENERAL
CALL_NON_PY_GENERAL
CALL_PY_EXACT_ARGS
CALL_PY_GENERAL
These instructions load from the callable cache, which is immutable, perform exact type checks, and use existing thread-safe APIs:
CALL_ISINSTANCE
CALL_LEN
CALL_LIST_APPEND
These instructions use existing thread-safe APIs:
CALL_STR_1
CALL_TUPLE_1
CALL_TYPE_1
Finally, these instructions don't categorize neatly:
CALL_BUILTIN_CLASS
- Performs exact type checks and loads from immutable types.Specialization
Apart from the changes discussed earlier, specialization is already thread-safe. It inspects immutable properties (i.e. those of code objects, method descriptors, or
PyCFunction
s) or properties that require stopping the world to mutate (i.e. properties checked by function version guards).