python / cpython

The Python programming language
https://www.python.org
Other
63.18k stars 30.25k forks source link

Race condition when importing `collections.abc` #125245

Open ngoldbaum opened 2 weeks ago

ngoldbaum commented 2 weeks ago

Bug report

Bug description:

Discovered alongside #125243 with similar steps to reproduce. I don't have a simpler way to trigger this than "run the PyO3 tests in a loop" because I think it requires many threads accessing the python runtime simulatenously.

To trigger it, have a rust toolchain and Python installed, clone the PyO3 repo and execute the following command:

while RUST_BACKTRACE=1 UNSAFE_PYO3_BUILD_FREE_THREADED=1 cargo test --lib --features=full -- --test-threads=100; do :; done

You may also hit some other test failures related to ZoneInfo, see the other issue I opened about that.

You will eventually see a test failure with the following text:

---- exceptions::socket::tests::gaierror stdout ----
thread 'exceptions::socket::tests::gaierror' panicked at src/impl_/exceptions.rs:26:17:
failed to import exception socket.gaierror: ImportError: cannot import name 'Mapping' from 'collections.abc' (/Users/goldbaum/.pyenv/versions/3.13.0t/lib/python3.13t/collections/abc.py)

I slightly modified PyO3 to get a traceback as well (hidden because it's a distractingly long diff):

``` diff --git a/src/impl_/exceptions.rs b/src/impl_/exceptions.rs index 15b6f53b..de63ad59 100644 --- a/src/impl_/exceptions.rs +++ b/src/impl_/exceptions.rs @@ -1,4 +1,8 @@ -use crate::{sync::GILOnceCell, types::PyType, Bound, Py, Python}; +use crate::{ + sync::GILOnceCell, + types::{PyTracebackMethods, PyType}, + Bound, Py, Python, +}; pub struct ImportedExceptionTypeObject { imported_value: GILOnceCell>, @@ -20,8 +24,11 @@ impl ImportedExceptionTypeObject { .import(py, self.module, self.name) .unwrap_or_else(|e| { panic!( - "failed to import exception {}.{}: {}", - self.module, self.name, e + "failed to import exception {}.{}: {}\n{}", + self.module, + self.name, + e, + e.traceback(py).unwrap().format().unwrap(), ) }) } ```

And the traceback is:

Traceback (most recent call last):
  File "/Users/goldbaum/.pyenv/versions/3.13.0t/lib/python3.13t/socket.py", line 55, in <module>
    import os, sys, io, selectors
  File "/Users/goldbaum/.pyenv/versions/3.13.0t/lib/python3.13t/selectors.py", line 10, in <module>
    from collections.abc import Mapping

So somehow during setup of the socket module, Mapping isn't available yet, but only if many threads are simultaneously touching the Python runtime.

(ping @davidhewitt, we probably want to disable the socket error tests on the free-threaded build as well)

CPython versions tested on:

3.13

Operating systems tested on:

macOS

Linked PRs

colesbury commented 2 weeks ago

EDIT: Commented on wrong issue

colesbury commented 2 weeks ago

There's a race condition that I believe was introduced in https://github.com/python/cpython/pull/123613. The sys.modules[__name__] = _collections_abc is not safe with the longstanding importlib optimization that avoids _ModuleLockManager if the module is already initialized:

https://github.com/python/cpython/blob/a726ce73ca69b3a5ccc2cbe23061070e686b1150/Lib/importlib/_bootstrap.py#L1351-L1355

I think this may affect the default (with GIL) build as well depending on when the GIL-switch happens, but it definitely will happen more often on the free threading build.

The problematic execution when two threads (T1, T2) run import collections.abc:

1) T1: Starts to import Lib/collections/abc.py, sets sys.modules["collections.abc"] 2) T2: Loads module = sys.modules["collections.abc"] (line 1353 above). This points to the Lib/collections/abc.py module. 3) T1: Finishes importing collections/abc.py, sys.modules["collections.abc"] now points to the Lib/_collections_abc.py module. 4) T2: Checks module. __spec__. _initializing (line 1355) on the . This module is finished initializing, so it's returned. Note that at this point sys.modules["collections.abc"] points to the correct _collections_abc.py module, but the local variable module still points to the collections/abc.py module.

T2 now incorrectly has the Python collections.abc module.

cc @serhiy-storchaka

EDIT: Lib/_collections_abc.py is also a Python module

colesbury commented 2 weeks ago

I think we can do something like:

--- a/Lib/collections/__init__.py
+++ b/Lib/collections/__init__.py
@@ -29,6 +29,9 @@
 import _collections_abc
 import sys as _sys

+_sys.modules['collections.abc'] = _collections_abc
+abc = _collections_abc
+
 from itertools import chain as _chain
 from itertools import repeat as _repeat
 from itertools import starmap as _starmap
--- a/Lib/collections/abc.py
+++ /dev/null
@@ -1,3 +0,0 @@
-import _collections_abc
-import sys
-sys.modules[__name__] = _collections_abc
hroncok commented 1 week ago

I think this may affect the default (with GIL) build as well depending on when the GIL-switch happens, but it definitely will happen more often on the free threading build.

Indeed. We see this in Fedora with the default (with GIL) build as well.

ngoldbaum commented 14 hours ago

This is fixed now.

colesbury commented 13 hours ago

There's an open PR, but it's not merged yet. I don't think the underlying issue is fixed.

I'm waiting on more feedback regarding the issue that Serhiy raised on the PR:

https://github.com/python/cpython/pull/125415#pullrequestreview-2365528082

ngoldbaum commented 13 hours ago

Oops, I'm sorry!