python / cpython

The Python programming language
https://www.python.org
Other
63.37k stars 30.33k forks source link

Make concurrent iteration over pairwise, combinations, permutations, cwr, product from itertools safe under free-threading #123471

Open eendebakpt opened 2 months ago

eendebakpt commented 2 months ago

Bug report

Bug description:

Several methods from the C implementation of the itertools module are not yet safe to use under the free-threading build. In this issue we list several issues to be addressed. The issues below are discussed for itertools.product, but the issues are similar for the other classes.

https://github.com/python/cpython/blob/58ce131037ecb34d506a613f21993cde2056f628/Modules/itertoolsmodule.c#L2038-L2044

This is not thread-safe, as multiple threads could have result == NULL evaluate to true. We could move the construction of the productobject.result to the constructor of product. This does mean that product will use more memory before the first invocation of next. This seems to be acceptable, as constructing a product without iterating over it seems rare in practice. The tuple also needs to be filled with data. For product it seems safe to do this in the constructor, as the data is coming from productobject->pools which is a tuple of tuples. But for pairwise the data is coming from an iterable

https://github.com/python/cpython/blob/58ce131037ecb34d506a613f21993cde2056f628/Modules/itertoolsmodule.c#L337-L343

which could be a generator. Reading data from the iterator before the first invocation of pairwise_next seems like a behavior change we do not want to make.

An alternative is to use some kind of locking inside product_next, but the locking should not add any overhead in the common path otherwise the single-thread performance will suffer.

https://github.com/python/cpython/blob/58ce131037ecb34d506a613f21993cde2056f628/Modules/itertoolsmodule.c#L352-L356

This cleaning up is not safe in concurrent iteration. Instead we can defer the cleaning up untill the object itself is decallocated (this approach was used for reversed, see https://github.com/python/cpython/pull/120971/files#r1653313765)

https://github.com/python/cpython/blob/58ce131037ecb34d506a613f21993cde2056f628/Modules/itertoolsmodule.c#L2077-L2088

If two threads both increment indices[i] the check on line 2078 is never true end we end up indexing pool with PyTuple_GET_ITEM outside the bounds on line 2088. Here we could change the check into indices[i] >= PyTuple_GET_SIZE(pool). That is equivalent for the single-threaded case, but does not lead to out-of-bounds indexing in the multi-threaded case (although it does lead to funny results!)

@rhettinger @colesbury Any input on the points above would be welcome.

CPython versions tested on:

CPython main branch

Operating systems tested on:

No response

Linked PRs

rhettinger commented 1 month ago

Can we talk about this at the sprint? I would like to have a sound overall strategy for how all of these should be approached (what guarantees can be made, what is most useful, decide whether to add locks, redesign from scratch or just provide an alternative code path, what is the least destabilizing changes that can be made, how close can be make this to generator equivalents, how to defend against reentrancy, how to not damage the stable existing implementation for non-freethreading builds, etc).

Also, I would like to start by evaluating itertools.tee() which currently throws a 'RuntimeError` even on the non-freethreading build. The may be a useful behavior there that involves adding locks.

Thinking just about pairwise(), I not even sure what the desirable behavior would be (other than not crashing). Is there any legit use case for two threads to race non-deterministically for a feed of successive pairs. ISTM like this would almost always be the wrong thing to do. Perhaps raising a 'RuntimeError` like tee does would be the most useful thing to do.

I definitely think we should not be creating PRs on this issue until we've made a conscious decision about the right overall approach.

eendebakpt commented 1 month ago

@rhettinger Great idea. When making the PRs I already noticed it is hard to make trade-offs between the different implementation options. I was considering writing a message on DPO to get some more people involved, but having it discussed at the sprint first is also good. I won't be there, but would be interested in the outcome!

eendebakpt commented 3 weeks ago

Notes from the sprint at here: https://github.com/python/cpython/issues/124397

serhiy-storchaka commented 3 weeks ago

There is a simple Python implementation in the documentation. I believe it has sane behavior in multi-threading (not crash, not leak, not hangs). The C implementation should be equivalent. The current C implementation has some shortcuts for performance, they are optional and can be removed in the free-threading build.

For example, you can remove the result tuple caching in the free-threading build. It should not affect the behavior in most cases, it is simply an optimization. I believe that it is possible to use the same code for both builds if define macros for few elementary operations instead of wrapping larger parts of code in #ifdef/#endif.

But all tests that do not depend on race conditions should pass in both builds.

eendebakpt commented 3 weeks ago

@serhiy @rhettinger For itertools.pairwise I created a new draft PR. Comments on the PR would be appreciated. In particular: