python / cpython

The Python programming language
https://www.python.org
Other
62.44k stars 29.97k forks source link

ContextVars are not thread-safe in the free-threaded build #121546

Closed colesbury closed 2 months ago

colesbury commented 2 months ago

Bug report

The ContextVar implementation is not thread-safe in the free-threaded build:

In particular, the caching is not thread-safe, although there may be other thread-safety issues as well:

https://github.com/python/cpython/blob/9c08f40a613d9aee78de4ce4ec3e125d1496d148/Python/context.c#L206-L212

Example Test Cases:

Linked PRs

Fidget-Spinner commented 2 months ago

Is throwing a lock on all modifications of the cache a valid solution? Or would that be too slow?

The other option is to "register" a cache in a linked list in thread state. Then have that be cleared on shutdown. I like this better.

colesbury commented 2 months ago

A lock would mean that concurrent accesses to ContextVars would not scale well. I'd be more tempted to just disable the cache in the free-threaded build and pay the increased access cost, but still scale well.

Can you explain your linked list idea more?

Fidget-Spinner commented 2 months ago
Change
struct _pycontextvarobject {
    PyObject_HEAD
    PyObject *var_name;
    PyObject *var_default;
    PyObject *var_cached;
    uint64_t var_cached_tsid;
    uint64_t var_cached_tsver;
    Py_hash_t var_hash;
};
to
struct _pycontextvarobject {
    PyObject_HEAD
    PyObject *var_name;
    PyObject *var_default;
    _PyCtxVarCacheEntry *var_entry;
    Py_hash_t var_hash;
};
where var_entry points to an entry in an array (something like a freelist).
Said array is managed similar to pycore_freelists.h.
The array looks like this:
| Entry 1          | | Entry 2          | | Free Entry 1  | |  Entry 3         | | Free Entry 2  |
| var_cached       | | var_cached       | | ---> next     | | var_cached       | | ---> NULL     |
| var_cached_tsid  | | var_cached_tsid  | |               | | var_cached_tsid  | |               |
| var_cached_tsver | | var_cached_tsver | |               | | var_cached_tsver | |               |

Flow:

  1. When PyContextVar_Get finds a valid object and sets the cache, we need to read the cache. A cache hit is lockless and returns directly the entry in the array. A cache miss needs to be locked to modify the array to update the cache.
  2. ContextVar_Set needs to lock when setting.

Therefore, most reads (cache hits) should scale well, only cache misses and writes (sets) scale badly. I'm not too sure how much of the HAMT code is thread-safe. So I have no clue whether we need additional locks there too. A cursory read suggests to me we should at least lock _PyHamt_Assoc in contextvar_set if I understand the code correctly. While I don't think it's easy to cause any races in the HAMT implementation from Python, it seems easier with the C API introduced in PEP 567.

colesbury commented 2 months ago

This seems like a complicated scheme and would still not scale well for typical usage patterns. The assumption that most accesses are cache hits is probably not true for a multi-threaded program. It's a plausible assumption with the GIL because you have coarse grained accesses: one thread runs at a time and may do lots of context var accesses within that time slice. Without the GIL, that's probably not true: a context var is likely to have many threads access it in a highly interleaved (or concurrent) pattern.

Contexts are per-thread. The lookup is on the current thread's context, and a context can only be active on a single thread at a time. HAMTs are persistent, in other words the contents are immutable, and mutations create a new data structure (with efficient sharing).

Fidget-Spinner commented 2 months ago

HAMTs are persistent, in other words the contents are immutable, and mutations create a new data structure (with efficient sharing)

Oops yea I forgot/didn't know that HAMTs are copy on write. Thanks for pointing that out. In that case I'll send a PR to disable the cache on free-threaded builds.

colesbury commented 2 months ago

I think this was addressed by @Fidget-Spinner in https://github.com/python/cpython/pull/121740