python / cpython

The Python programming language
https://www.python.org
Other
63.85k stars 30.56k forks source link

`methodcaller` is not thread-safe (or re-entrant) #127065

Open colesbury opened 1 week ago

colesbury commented 1 week ago

Bug report

EDIT: edited to clarify that the issue is in the C implementation of operator.methodcaller.

Originally reported by @ngoldbaum in https://github.com/crate-py/rpds/issues/101

Reproducer ```python from operator import methodcaller from concurrent.futures import ThreadPoolExecutor class HashTrieMap(): def keys(self): return None def values(self): return None def items(self): return None num_workers=1000 views = [methodcaller(p) for p in ["keys", "values", "items"]] def work(view): m, d = HashTrieMap(), {} view(m) view(d) iterations = 10 for _ in range(iterations): executor = ThreadPoolExecutor(max_workers=num_workers) for view in views: futures = [executor.submit(work, view) for _ in range(num_workers)] results = [future.result() for future in futures] ```

Once every 5-10 runs, the program prints:

TypeError: descriptor 'keys' for 'dict' objects doesn't apply to a 'HashTrieMap' object

The problem is that operator.methodcaller is not thread-safe because it modifies the vectorcall_args, which is shared across calls:

https://github.com/python/cpython/blob/0af4ec30bd2e3a52350344d1011c0c125d6dcd71/Modules/_operator.c#L1646-L1666

I think this is generally unsafe, not just for free threading. The vectorcall args array needs to be valid for the duration of the call, and it's possible for methodcaller to be called reentrantly or by another thread while the call is still ongoing.

Linked PRs

colesbury commented 1 week ago

Hmm, actually it looks like a thread safety issue in methodcaller.

ngoldbaum commented 1 week ago

Yeah, I noticed early it wasn't triggering if I called the descriptors directly without going through methodcaller.

colesbury commented 1 week ago

I think the optimized vectorcall implementation from https://github.com/python/cpython/pull/107201 is not reentrant or thread-safe (even with the GIL) because the shared vectorcall_args array is modified during each invocation:

https://github.com/python/cpython/blob/3926842117feffe5d2c9727e1899bea5ae2adb28/Modules/_operator.c#L1661

Most (or all) of our current vectorcall implementations extract out args[0] early, but that's not guaranteed and the vectorcall protocol is a public API.

For now, I'm planning to just disable the optimization in the free threading build (https://github.com/python/cpython/pull/127109), but I don't see an easy way of making it thread-safe, and I think we should consider reverting #107201 in 3.13 and main.

cc @eendebakpt @corona10 @vstinner

eendebakpt commented 1 week ago

I agree this looks like a bug also in the normal build. The issue is hard to trigger in the normal build though, as indeed most vectorcall implementations (including the PyObject_VectorcallMethod used by the methodcaller) get args[0] before doing any work).

One option to make the vectorcall implementation thread-safe (and re-entrant) would be to copy the entire mc->vectorcall_args to a temporary buffer before setting the first element to args[0]. An implementation of this idea is https://github.com/python/cpython/compare/main...eendebakpt:methodcaller_ft.

A quick benchmark (using the script from https://github.com/python/cpython/pull/107201) shows there is still a significant performance increase:

Current main:

call: Mean +- std dev: 48.2 ns +- 4.1 ns
creation: Mean +- std dev: 58.3 ns +- 3.4 ns
creation+call: Mean +- std dev: 114 ns +- 5 ns
call kwarg: Mean +- std dev: 63.8 ns +- 3.7 ns
creation kwarg: Mean +- std dev: 92.7 ns +- 3.1 ns
creation+call kwarg: Mean +- std dev: 187 ns +- 7 ns

Disable the vectorcall (this is the solution Sam added to the FT build):

call: Mean +- std dev: 78.2 ns +- 3.2 ns
creation: Mean +- std dev: 56.6 ns +- 2.7 ns
creation+call: Mean +- std dev: 136 ns +- 6 ns
call kwarg: Mean +- std dev: 124 ns +- 6 ns
creation kwarg: Mean +- std dev: 92.3 ns +- 4.1 ns
creation+call kwarg: Mean +- std dev: 228 ns +- 10 ns

Copy mc->vectorcall_args to a temporary buffer:

call: Mean +- std dev: 50.9 ns +- 1.6 ns
creation: Mean +- std dev: 56.6 ns +- 4.8 ns
creation+call: Mean +- std dev: 118 ns +- 6 ns
call kwarg: Mean +- std dev: 64.8 ns +- 2.5 ns
creation kwarg: Mean +- std dev: 91.9 ns +- 4.9 ns
creation+call kwarg: Mean +- std dev: 194 ns +- 9 ns

(all benchmarks without PGO and with the normal Python build)

@colesbury Is this idea an option for the free-threading build? @Yhg1s What would you like to do for the 3.13 branch?

vstinner commented 1 week ago

An implementation of this idea is https://github.com/python/cpython/compare/main...eendebakpt:methodcaller_ft.

Can you create a PR? I cannot open the link, it takes too long and GitHub displays an error.

eendebakpt commented 1 day ago

I removed the draft status from the PR (thanks to @colesbury for some good suggestions). Both PR #127109 or reverting #107201 are good options I think (a classic trade-off between performance and added complexity, except that in this case the status quo is not good)