concurrent pip-install fails to build wheel due to race

awaizman1 commented 3 years ago

Environment

pip version: 20.2.4 (and others)
Python version: 3.6 (and others)
OS: windows 10 (ubuntu 18 too)

Description

When running pip in parallel (multiple processes) to install some source distribution, the wheel creation phase fails because all processes tries to create a whl in the same location (assuming the whl doesn't already exist in whl cache).

In my CI environment we build multiple python projects in parallel. In addition the build happens within a clean docker container, thus the pip cache is empty on every CI run. So I get into a situation where I have multiple venvs (one per python project) and multiple processes are running concurrently and install some source distribution package (in my case its pydevd-pycharm==202.7319.37).

Because this is a source distribution, pip first creates a whl from it. Since concurrent pip processes are trying to do this simultaneously into the same cache location (i.e. c:\users\awaizman101364\appdata\local\pip\cache\wheels\ab\b5\b5\be64936edf514f04910c749842d4846e0afc64a9de7e319067) it fails (errors like EOFError, BadZipFile, etc.)

Expected behavior

wheel creation by pip should be multi process safe (similar to download)

How to Reproduce

create 4 venv with latest pip (20.2.4), setuptools(50.3.2) and wheel(0.35.1)

python -v venv venv<1-4>
venv<1-4>\Scripts\python.exe -m pip install pip==20.2.4
venv<1-4>\Scripts\python.exe -m pip install setuptools==50.3.2 wheel==0.35.1

make sure pip cache is empty (i.e. by removing c:\users\awaizman101364\appdata\local\pip)

concurrently run pip install pydevd-pycharm==202.7319.37 in all 4 venvs

start env1\Scripts\python.exe -m pip install pydevd-pycharm==202.7319.37 && ... && start env4\Scripts\python.exe -m pip install pydevd-pycharm==202.7319.37

Output

this is the errors i get: env3 process:

C:\Users\awaizman101364>env3\Scripts\python.exe -m pip install pydevd-pycharm==202.7319.37
Collecting pydevd-pycharm==202.7319.37
  Downloading pydevd-pycharm-202.7319.37.tar.gz (5.1 MB)
     |████████████████████████████████| 5.1 MB 547 kB/s
Building wheels for collected packages: pydevd-pycharm
  Building wheel for pydevd-pycharm (setup.py) ... done
  Created wheel for pydevd-pycharm: filename=pydevd_pycharm-202.7319.37-py3-none-any.whl size=10083142 sha256=93028682b500ed3b45853bd0994f9d0ca7fe7b5b79680105a7cabf30bee88d3e
  Stored in directory: c:\users\awaizman101364\appdata\local\pip\cache\wheels\ab\b5\b5\be64936edf514f04910c749842d4846e0afc64a9de7e319067
Successfully built pydevd-pycharm
Installing collected packages: pydevd-pycharm
ERROR: Exception:
Traceback (most recent call last):
  File "C:\Users\awaizman101364\env3\lib\site-packages\pip\_internal\cli\base_command.py", line 228, in _main
    status = self.run(options, args)
  File "C:\Users\awaizman101364\env3\lib\site-packages\pip\_internal\cli\req_command.py", line 182, in wrapper
    return func(self, options, args)
  File "C:\Users\awaizman101364\env3\lib\site-packages\pip\_internal\commands\install.py", line 406, in run
    pycompile=options.compile,
  File "C:\Users\awaizman101364\env3\lib\site-packages\pip\_internal\req\__init__.py", line 90, in install_given_reqs
    pycompile=pycompile,
  File "C:\Users\awaizman101364\env3\lib\site-packages\pip\_internal\req\req_install.py", line 822, in install
    requested=self.user_supplied,
  File "C:\Users\awaizman101364\env3\lib\site-packages\pip\_internal\operations\install\wheel.py", line 860, in install_wheel
    requested=requested,
  File "C:\Users\awaizman101364\env3\lib\site-packages\pip\_internal\operations\install\wheel.py", line 689, in _install_wheel
    file.save()
  File "C:\Users\awaizman101364\env3\lib\site-packages\pip\_internal\operations\install\wheel.py", line 456, in save
    shutil.copyfileobj(f, dest)
  File "C:\Users\awaizman101364\AppData\Local\Programs\Python\Python36\lib\shutil.py", line 79, in copyfileobj
    buf = fsrc.read(length)
  File "C:\Users\awaizman101364\AppData\Local\Programs\Python\Python36\lib\zipfile.py", line 872, in read
    data = self._read1(n)
  File "C:\Users\awaizman101364\AppData\Local\Programs\Python\Python36\lib\zipfile.py", line 940, in _read1
    data += self._read2(n - len(data))
  File "C:\Users\awaizman101364\AppData\Local\Programs\Python\Python36\lib\zipfile.py", line 975, in _read2
    raise EOFError
EOFError

env4 process:

C:\Users\awaizman101364>env4\Scripts\python.exe -m pip install pydevd-pycharm==202.7319.37
Collecting pydevd-pycharm==202.7319.37
  Downloading pydevd-pycharm-202.7319.37.tar.gz (5.1 MB)
     |████████████████████████████████| 5.1 MB 6.8 MB/s
Building wheels for collected packages: pydevd-pycharm
  Building wheel for pydevd-pycharm (setup.py) ... done
  Created wheel for pydevd-pycharm: filename=pydevd_pycharm-202.7319.37-py3-none-any.whl size=10083142 sha256=93028682b500ed3b45853bd0994f9d0ca7fe7b5b79680105a7cabf30bee88d3e
  Stored in directory: c:\users\awaizman101364\appdata\local\pip\cache\wheels\ab\b5\b5\be64936edf514f04910c749842d4846e0afc64a9de7e319067
Successfully built pydevd-pycharm
ERROR: Error while checking for conflicts. Please file an issue on pip's issue tracker: https://github.com/pypa/pip/issues/new
Traceback (most recent call last):
  File "C:\Users\awaizman101364\env4\lib\site-packages\pip\_internal\commands\install.py", line 520, in _determine_conflicts
    return check_install_conflicts(to_install)
  File "C:\Users\awaizman101364\env4\lib\site-packages\pip\_internal\operations\check.py", line 110, in check_install_conflicts
    would_be_installed = _simulate_installation_of(to_install, package_set)
  File "C:\Users\awaizman101364\env4\lib\site-packages\pip\_internal\operations\check.py", line 134, in _simulate_installation_of
    dist = abstract_dist.get_pkg_resources_distribution()
  File "C:\Users\awaizman101364\env4\lib\site-packages\pip\_internal\distributions\wheel.py", line 29, in get_pkg_resources_distribution
    with ZipFile(self.req.local_file_path, allowZip64=True) as z:
  File "C:\Users\awaizman101364\AppData\Local\Programs\Python\Python36\lib\zipfile.py", line 1131, in __init__
    self._RealGetContents()
  File "C:\Users\awaizman101364\AppData\Local\Programs\Python\Python36\lib\zipfile.py", line 1198, in _RealGetContents
    raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file
Installing collected packages: pydevd-pycharm
ERROR: Exception:
Traceback (most recent call last):
  File "C:\Users\awaizman101364\env4\lib\site-packages\pip\_internal\cli\base_command.py", line 228, in _main
    status = self.run(options, args)
  File "C:\Users\awaizman101364\env4\lib\site-packages\pip\_internal\cli\req_command.py", line 182, in wrapper
    return func(self, options, args)
  File "C:\Users\awaizman101364\env4\lib\site-packages\pip\_internal\commands\install.py", line 406, in run
    pycompile=options.compile,
  File "C:\Users\awaizman101364\env4\lib\site-packages\pip\_internal\req\__init__.py", line 90, in install_given_reqs
    pycompile=pycompile,
  File "C:\Users\awaizman101364\env4\lib\site-packages\pip\_internal\req\req_install.py", line 822, in install
    requested=self.user_supplied,
  File "C:\Users\awaizman101364\env4\lib\site-packages\pip\_internal\operations\install\wheel.py", line 850, in install_wheel
    with ZipFile(wheel_path, allowZip64=True) as z:
  File "C:\Users\awaizman101364\AppData\Local\Programs\Python\Python36\lib\zipfile.py", line 1131, in __init__
    self._RealGetContents()
  File "C:\Users\awaizman101364\AppData\Local\Programs\Python\Python36\lib\zipfile.py", line 1198, in _RealGetContents
    raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file

ziima commented 3 years ago

We encountered a similar issue, described in tox-dev/tox#2067.

lukasjuhrich commented 2 years ago

The same happens with our builds (clean docker build, sporadic failures around 50% of the time with BadZipFile). However, I'm not sure where any parallel processing would occur; we run docker-compose -f docker-compose.test.yml run --rm test-app /bin/trueand let the work be done via the entrypoint (compose file, entrypoint hook).

Env:

Debian stretch
Pip 21.2.1

@awaizman1 have you found a workaround? Having to restart the CI regularly consumes a fair amount of time.

To People with pip knowledge, how can I debug this further?

lukasjuhrich commented 2 years ago

Judging by the existence of #9964 there may be different reasons in our case. But neither parallelism nor cache problems make sense for a single-process clean docker build. I'm not sure what options I have to dig deeper into this.

pradyunsg commented 2 years ago

The way pip does wheel caching, is that it writes wheels into an "adjacent temporary file" and then moves them.

I think the confirmation I want here is whether the issue here is that the same adjacent temporary file is used by different processes of pip (which... would be weird) or whether they somehow get moved in a way that clobbers the data.

pradyunsg commented 2 years ago

Specifically, I'm wondering if either of these solve this problem:

diff --git a/src/pip/_internal/utils/filesystem.py b/src/pip/_internal/utils/filesystem.py
index b7e6191ab..a66a0f6fa 100644
--- a/src/pip/_internal/utils/filesystem.py
+++ b/src/pip/_internal/utils/filesystem.py
@@ -1,3 +1,4 @@
+import fcntl
 import fnmatch
 import os
 import os.path
@@ -95,9 +96,13 @@ def adjacent_tmp_file(path: str, **kwargs: Any) -> Iterator[BinaryIO]:

 # Tenacity raises RetryError by default, explicitly raise the original exception
-_replace_retry = retry(reraise=True, stop=stop_after_delay(1), wait=wait_fixed(0.25))
-
-replace = _replace_retry(os.replace)
+@retry(reraise=True, stop=stop_after_delay(1), wait=wait_fixed(0.25))
+def replace(src, dst):
+    try:
+        fcntl.flock(dst, fcntl.LOCK_EX)
+        os.replace(src, dst)
+    finally:
+        fcntl.flock(dst, fcntl.F_UNLCK)

 # test_writable_dir and _test_writable_dir_win are copied from Flit,

OR

diff --git a/src/pip/_internal/utils/filesystem.py b/src/pip/_internal/utils/filesystem.py
index b7e6191ab..3aea86fe5 100644
--- a/src/pip/_internal/utils/filesystem.py
+++ b/src/pip/_internal/utils/filesystem.py
@@ -1,3 +1,4 @@
+import fcntl
 import fnmatch
 import os
 import os.path
@@ -88,10 +89,12 @@ def adjacent_tmp_file(path: str, **kwargs: Any) -> Iterator[BinaryIO]:
     ) as f:
         result = cast(BinaryIO, f)
         try:
+            fcntl.flock(result, fcntl.LOCK_EX)
             yield result
         finally:
             result.flush()
             os.fsync(result.fileno())
+            fcntl.flock(dst, fcntl.F_UNLCK)

 # Tenacity raises RetryError by default, explicitly raise the original exception

godlygeek commented 2 years ago

os.replace() is required to be atomic, in the sense that something trying to read that file will either see the old version or the new version, never a partial file or a mix and match of both files. So, the first of those two proposed patches doesn't seem helpful to me.

awaizman1 commented 2 years ago

The same happens with our builds (clean docker build, sporadic failures around 50% of the time with BadZipFile). However, I'm not sure where any parallel processing would occur; we run docker-compose -f docker-compose.test.yml run --rm test-app /bin/trueand let the work be done via the entrypoint (compose file, entrypoint hook).

Env:

Debian stretch

Pip 21.2.1

@awaizman1 have you found a workaround? Having to restart the CI regularly consumes a fair amount of time.

To People with pip knowledge, how can I debug this further?

ששש

awaizman1 commented 2 years ago

Hi @lukasjuhrich

In my use case it happens since I run multiple pip instances concurrently on multiple different venvs. my workaround is to retry the failed pip command. I added this retry mechanism to our build system. When pip fails the build system retries and usually it works.

xaviergmail commented 8 months ago

I am running into the same issue when installing several projects in parallel with pipenv with a filesystem package as an editable dependency. It fails even with the following:

  PIP_NO_BINARY: ":all:"
  PIP_NO_BUILD_ISOLATION: "no"
  PIP_NO_CLEAN: "no"

The build directory created as a sibling to setup.py typically results in a wheel directory with 0200 permissions (u+w) which results in no permission errors in the other processes. The nuclear approach of setting umask 000 does not get around this either.

Has anyone found a workaround?

stinovlas commented 6 months ago

We're still getting these errors regularly. This seems to strongly hint that the problem actually is in the adjacent temporary file:

ERROR: Wheel 'restructuredtext-lint' located at /builds/group/my_project/.cache/pip/wheels/65/4b/1c/d59fca1ba14ad38d9ef60a4247fac922f709cecbbb525bf554/restructuredtext_lint-1.4.0-py3-none-any.whl is invalid.

I'm not sure whether patch proposed by @pradyunsg would actually help, because the problem is probably not just writing the adjacent temporary file atomically, but also moving it to the final location. Otherwise you can still have a race condition when you finish writing the temporary file and some other process opens it for writing again, before the rename happens.

pypa / pip

concurrent pip-install fails to build wheel due to race #9034