Open awaizman1 opened 3 years ago
We encountered a similar issue, described in tox-dev/tox#2067.
The same happens with our builds (clean docker build, sporadic failures around 50% of the time with BadZipFile
). However, I'm not sure where any parallel processing would occur; we run docker-compose -f docker-compose.test.yml run --rm test-app /bin/true
and let the work be done via the entrypoint (compose file, entrypoint hook).
Env:
21.2.1
@awaizman1 have you found a workaround? Having to restart the CI regularly consumes a fair amount of time.
To People with pip knowledge, how can I debug this further?
Judging by the existence of #9964 there may be different reasons in our case. But neither parallelism nor cache problems make sense for a single-process clean docker build. I'm not sure what options I have to dig deeper into this.
The way pip does wheel caching, is that it writes wheels into an "adjacent temporary file" and then moves them.
I think the confirmation I want here is whether the issue here is that the same adjacent temporary file is used by different processes of pip (which... would be weird) or whether they somehow get moved in a way that clobbers the data.
Specifically, I'm wondering if either of these solve this problem:
diff --git a/src/pip/_internal/utils/filesystem.py b/src/pip/_internal/utils/filesystem.py
index b7e6191ab..a66a0f6fa 100644
--- a/src/pip/_internal/utils/filesystem.py
+++ b/src/pip/_internal/utils/filesystem.py
@@ -1,3 +1,4 @@
+import fcntl
import fnmatch
import os
import os.path
@@ -95,9 +96,13 @@ def adjacent_tmp_file(path: str, **kwargs: Any) -> Iterator[BinaryIO]:
# Tenacity raises RetryError by default, explicitly raise the original exception
-_replace_retry = retry(reraise=True, stop=stop_after_delay(1), wait=wait_fixed(0.25))
-
-replace = _replace_retry(os.replace)
+@retry(reraise=True, stop=stop_after_delay(1), wait=wait_fixed(0.25))
+def replace(src, dst):
+ try:
+ fcntl.flock(dst, fcntl.LOCK_EX)
+ os.replace(src, dst)
+ finally:
+ fcntl.flock(dst, fcntl.F_UNLCK)
# test_writable_dir and _test_writable_dir_win are copied from Flit,
OR
diff --git a/src/pip/_internal/utils/filesystem.py b/src/pip/_internal/utils/filesystem.py
index b7e6191ab..3aea86fe5 100644
--- a/src/pip/_internal/utils/filesystem.py
+++ b/src/pip/_internal/utils/filesystem.py
@@ -1,3 +1,4 @@
+import fcntl
import fnmatch
import os
import os.path
@@ -88,10 +89,12 @@ def adjacent_tmp_file(path: str, **kwargs: Any) -> Iterator[BinaryIO]:
) as f:
result = cast(BinaryIO, f)
try:
+ fcntl.flock(result, fcntl.LOCK_EX)
yield result
finally:
result.flush()
os.fsync(result.fileno())
+ fcntl.flock(dst, fcntl.F_UNLCK)
# Tenacity raises RetryError by default, explicitly raise the original exception
os.replace()
is required to be atomic, in the sense that something trying to read that file will either see the old version or the new version, never a partial file or a mix and match of both files. So, the first of those two proposed patches doesn't seem helpful to me.
The same happens with our builds (clean docker build, sporadic failures around 50% of the time with
BadZipFile
). However, I'm not sure where any parallel processing would occur; we rundocker-compose -f docker-compose.test.yml run --rm test-app /bin/true
and let the work be done via the entrypoint (compose file, entrypoint hook).Env:
- Debian stretch
- Pip
21.2.1
@awaizman1 have you found a workaround? Having to restart the CI regularly consumes a fair amount of time.
To People with pip knowledge, how can I debug this further?
ששש
Hi @lukasjuhrich
In my use case it happens since I run multiple pip instances concurrently on multiple different venvs. my workaround is to retry the failed pip command. I added this retry mechanism to our build system. When pip fails the build system retries and usually it works.
I am running into the same issue when installing several projects in parallel with pipenv with a filesystem package as an editable dependency. It fails even with the following:
PIP_NO_BINARY: ":all:"
PIP_NO_BUILD_ISOLATION: "no"
PIP_NO_CLEAN: "no"
The build
directory created as a sibling to setup.py
typically results in a wheel
directory with 0200
permissions (u+w
) which results in no permission
errors in the other processes. The nuclear approach of setting umask 000
does not get around this either.
Has anyone found a workaround?
We're still getting these errors regularly. This seems to strongly hint that the problem actually is in the adjacent temporary file:
ERROR: Wheel 'restructuredtext-lint' located at /builds/group/my_project/.cache/pip/wheels/65/4b/1c/d59fca1ba14ad38d9ef60a4247fac922f709cecbbb525bf554/restructuredtext_lint-1.4.0-py3-none-any.whl is invalid.
I'm not sure whether patch proposed by @pradyunsg would actually help, because the problem is probably not just writing the adjacent temporary file atomically, but also moving it to the final location. Otherwise you can still have a race condition when you finish writing the temporary file and some other process opens it for writing again, before the rename happens.
Environment
Description
When running pip in parallel (multiple processes) to install some source distribution, the wheel creation phase fails because all processes tries to create a whl in the same location (assuming the whl doesn't already exist in whl cache).
In my CI environment we build multiple python projects in parallel. In addition the build happens within a clean docker container, thus the pip cache is empty on every CI run. So I get into a situation where I have multiple venvs (one per python project) and multiple processes are running concurrently and install some source distribution package (in my case its pydevd-pycharm==202.7319.37).
Because this is a source distribution, pip first creates a whl from it. Since concurrent pip processes are trying to do this simultaneously into the same cache location (i.e. c:\users\awaizman101364\appdata\local\pip\cache\wheels\ab\b5\b5\be64936edf514f04910c749842d4846e0afc64a9de7e319067) it fails (errors like EOFError, BadZipFile, etc.)
Expected behavior
wheel creation by pip should be multi process safe (similar to download)
How to Reproduce
Output
this is the errors i get: env3 process:
env4 process: