Open eendebakpt opened 3 months ago
There are more places in the codebase where the same issue occurs:
https://github.com/python/cpython/blob/58fdb169c8a93925541fecc74ba73c566147f2ca/Modules/itertoolsmodule.c#L360-L367 https://github.com/python/cpython/blob/58fdb169c8a93925541fecc74ba73c566147f2ca/Modules/itertoolsmodule.c#L3713-L3736 https://github.com/python/cpython/blob/126910edba812a01794f307b0cfa2a7f02bda190/Objects/enumobject.c#L246-L253 https://github.com/python/cpython/blob/126910edba812a01794f307b0cfa2a7f02bda190/Objects/enumobject.c#L246-L253 https://github.com/python/cpython/blob/126910edba812a01794f307b0cfa2a7f02bda190/Objects/enumobject.c#L196-L209 https://github.com/python/cpython/blob/58fdb169c8a93925541fecc74ba73c566147f2ca/Python/bltinmodule.c#L2974-L2994
It would be reasonable to ifdef
out the tuple reuse fast path on a free-threaded build.
@rhettinger Disabling the re-use of the tuple solves part of the issues with free-threading.
We should also disable clearing the reference to the iterator (e.g. Py_CLEAR(po->it);
inside pairwise_enum
), otherwise one thread can clear the iterator (making the reference count zero) and another thread still use it. That will not impact performance or impact readability of the code. But it could subtly change behavior (see the notes in https://github.com/python/cpython/pull/123848 on exhausting the iterator)
And finally we have to make sure we handle updates to po->old
correctly. In current main it can happen that in one thread the reference count to po->old is
reduced to zero at lines
https://github.com/python/cpython/blob/dd0ee201da34d1d4a631d77b420728f9233f53f9/Modules/itertoolsmodule.c#L382-L383 while in another thread a borrowed reference is obtained at https://github.com/python/cpython/blob/dd0ee201da34d1d4a631d77b420728f9233f53f9/Modules/itertoolsmodule.c#L331
I think for pairwise
we can do this with reasonably clean code.
For some other methods in itertools
the situation is similar: there is an internal state that needs to be mutated during iteration. For example for itertools.product
the internal state of the iterator is contained in lz->indices
and lz->result
and mutated in
https://github.com/python/cpython/blob/dd0ee201da34d1d4a631d77b420728f9233f53f9/Modules/itertoolsmodule.c#L2057-L2075
I will think a bit more about alternative options.
Bug report
Bug description:
The enumerate object (and also the
itertools.pairwise
since https://github.com/python/cpython/pull/118219) re-uses tuples when possible. It does this by checking the reference count to be 1:The refcount check and increment are not atomic, so in the free-threading build multiple threads can end up operating on the
result
object. It is possible that one thread already sets an item in the tuple, and another thread decrefs the item believing it is still an old item.In the nogil reference implementation (https://github.com/colesbury/nogil) no changes are made to address this. So maybe the issue cannot occur?
In https://github.com/python/cpython/pull/120591 this problem is addressed with a lock, which might be too much overhead (see the discussion in https://github.com/python/cpython/issues/120496)
@colesbury @hauntsaninja
CPython versions tested on:
CPython main branch
Operating systems tested on:
Windows
Linked PRs