pypa / pip

The Python package installer
https://pip.pypa.io/
MIT License
9.47k stars 3.01k forks source link

--retries has no effect during streaming downloads #12383

Open asottile-sentry opened 10 months ago

asottile-sentry commented 10 months ago

Description

a few things up front since I've found a few related issues, including one that's almost certainly an exact duplicate however the advice there doesn't seem like an exact match to my case:

in my case I am seeing this from github actions both against public pypi and against an "internal" but public-facing wheeling mirror at https://pypi.devinfra.sentry.io/simple . I'm using the default timeout (15 seconds) which for most connections is fast enough (typical network speeds between GHA and public pypi or our pypi server are about 50MBps -- easily downloading even the bulkiest packages in a second or two). unfortunately GitHub's network is significantly flaky -- and simple retries would help immensely for these downloads (even if they started over from the beginning as #4796)

there was rationale given in the duplicate above here that retrying ReadTimeOutErrors would lead to excessively long waiting -- however pip seems to already retry them in cases which aren't streamed responses -- here's an example where I've artificially reduced the public pypi timeout low enough and you can see the retries on ReadTimeOutError:

$ pip install --timeout .01 cfgv
WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7f3759653190>: Failed to establish a new connection: [Errno 101] Network is unreachable')': /simple/cfgv/
WARNING: Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='pypi.org', port=443): Read timed out. (read timeout=0.01)")': /simple/cfgv/
WARNING: Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='pypi.org', port=443): Read timed out. (read timeout=0.01)")': /simple/cfgv/
WARNING: Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='pypi.org', port=443): Read timed out. (read timeout=0.01)")': /simple/cfgv/
WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='pypi.org', port=443): Read timed out. (read timeout=0.01)")': /simple/cfgv/
ERROR: Could not find a version that satisfies the requirement cfgv (from versions: none)
ERROR: No matching distribution found for cfgv

worst case I can retry the entire pip installation -- but this seems like quite a heavy hammer for what should essentially be a bunch of file downloads


related: it seems --timeout also doesn't affect streamed downloads:

$ time pip download --no-cache-dir --timeout .25 torch --no-deps
Collecting torch
  Obtaining dependency information for torch from https://files.pythonhosted.org/packages/e1/24/f7fe3fe82583e6891cc3fceeb390f192f6c7f1d87e5a99a949ed33c96167/torch-2.1.0-cp38-cp38-manylinux1_x86_64.whl.metadata
  Downloading torch-2.1.0-cp38-cp38-manylinux1_x86_64.whl.metadata (25 kB)
Downloading torch-2.1.0-cp38-cp38-manylinux1_x86_64.whl (670.2 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 670.2/670.2 MB 34.6 MB/s eta 0:00:00
Saved ./torch-2.1.0-cp38-cp38-manylinux1_x86_64.whl
Successfully downloaded torch

real    0m22.157s
user    0m6.681s
sys 0m3.720s

Expected behavior

I expect a retry to occur when streaming a response (wheel / sdist / archive download) when the connection is stalled due to intermittent network failures leading to ReadTimeOut

pip version

22.1.2 -- also reproduced on latest (23.3.1)

Python version

3.8.16

OS

ubuntu 22.04 (ubuntu-latest in GHA)

How to Reproduce

I am using this requirements file -- however I can reproduce it against public pypi with enough attempts (github actions network is quite flaky unfortunately!)

effectively I'm running pip install -r requirements-dev-frozen.txt

Output

going to hide this one because it's a bit long

the relevant error is:

  File "/home/runner/work/sentry/sentry/.venv/lib/python3.8/site-packages/pip/_vendor/urllib3/response.py", line 443, in _error_catcher
    raise ReadTimeoutError(self._pool, None, "Read timed out.")
pip._vendor.urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='pypi.devinfra.sentry.io', port=443): Read timed out.
$ pip install -r requirements-dev-frozen.txt ``` Looking in indexes: https://pypi.devinfra.sentry.io/simple Collecting aiohttp==3.8.5 Downloading https://pypi.devinfra.sentry.io/wheels/aiohttp-3.8.5-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 29.5 MB/s eta 0:00:00 Collecting aiosignal==1.3.1 Downloading https://pypi.devinfra.sentry.io/wheels/aiosignal-1.3.1-py3-none-any.whl (7.6 kB) Collecting amqp==2.6.1 Downloading https://pypi.devinfra.sentry.io/wheels/amqp-2.6.1-py2.py3-none-any.whl (48 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 48.0/48.0 KB 14.0 MB/s eta 0:00:00 Collecting asgiref==3.7.2 Downloading https://pypi.devinfra.sentry.io/wheels/asgiref-3.7.2-py3-none-any.whl (24 kB) Collecting async-generator==1.10 Downloading https://pypi.devinfra.sentry.io/wheels/async_generator-1.10-py3-none-any.whl (18 kB) Collecting async-timeout==4.0.2 Downloading https://pypi.devinfra.sentry.io/wheels/async_timeout-4.0.2-py3-none-any.whl (5.8 kB) Collecting attrs==19.2.0 Downloading https://pypi.devinfra.sentry.io/wheels/attrs-19.2.0-py2.py3-none-any.whl (40 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 40.7/40.7 KB 15.5 MB/s eta 0:00:00 Collecting avalara==20.9.0 Downloading https://pypi.devinfra.sentry.io/wheels/Avalara-20.9.0-py3-none-any.whl (61 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 62.0/62.0 KB 16.5 MB/s eta 0:00:00 Collecting beautifulsoup4==4.7.1 Downloading https://pypi.devinfra.sentry.io/wheels/beautifulsoup4-4.7.1-py3-none-any.whl (94 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 94.3/94.3 KB 36.6 MB/s eta 0:00:00 Collecting billiard==3.6.4.0 Downloading https://pypi.devinfra.sentry.io/wheels/billiard-3.6.4.0-py3-none-any.whl (89 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 89.5/89.5 KB 29.1 MB/s eta 0:00:00 Collecting black==22.10.0 Downloading https://pypi.devinfra.sentry.io/wheels/black-22.10.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.5 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.5/1.5 MB 72.5 MB/s eta 0:00:00 Collecting boto3==1.28.26 Downloading https://pypi.devinfra.sentry.io/wheels/boto3-1.28.26-py3-none-any.whl (135 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 135.8/135.8 KB 44.6 MB/s eta 0:00:00 Collecting botocore==1.31.26 Downloading https://pypi.devinfra.sentry.io/wheels/botocore-1.31.26-py3-none-any.whl (11.1 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.1/11.1 MB 50.7 MB/s eta 0:00:00 Collecting brotli==1.0.9 Downloading https://pypi.devinfra.sentry.io/wheels/Brotli-1.0.9-cp38-cp38-manylinux1_x86_64.whl (357 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 357.2/357.2 KB 7.0 MB/s eta 0:00:00 Collecting build==0.8.0 Downloading https://pypi.devinfra.sentry.io/wheels/build-0.8.0-py3-none-any.whl (17 kB) Collecting cachetools==5.3.0 Downloading https://pypi.devinfra.sentry.io/wheels/cachetools-5.3.0-py3-none-any.whl (9.3 kB) Collecting celery==4.4.7 Downloading https://pypi.devinfra.sentry.io/wheels/celery-4.4.7-py2.py3-none-any.whl (427 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 427.6/427.6 KB 90.3 MB/s eta 0:00:00 Collecting certifi==2023.7.22 Downloading https://pypi.devinfra.sentry.io/wheels/certifi-2023.7.22-py3-none-any.whl (158 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 158.3/158.3 KB 50.7 MB/s eta 0:00:00 Collecting cffi==1.15.1 Downloading https://pypi.devinfra.sentry.io/wheels/cffi-1.15.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (442 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 442.7/442.7 KB 83.7 MB/s eta 0:00:00 Collecting cfgv==3.3.1 Downloading https://pypi.devinfra.sentry.io/wheels/cfgv-3.3.1-py2.py3-none-any.whl (7.3 kB) Collecting charset-normalizer==3.0.1 Downloading https://pypi.devinfra.sentry.io/wheels/charset_normalizer-3.0.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (195 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 195.4/195.4 KB 50.4 MB/s eta 0:00:00 Collecting click==8.0.4 Downloading https://pypi.devinfra.sentry.io/wheels/click-8.0.4-py3-none-any.whl (97 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 97.5/97.5 KB 31.9 MB/s eta 0:00:00 Collecting confluent-kafka==2.1.1 Downloading https://pypi.devinfra.sentry.io/wheels/confluent_kafka-2.1.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.9 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.9/3.9 MB 79.9 MB/s eta 0:00:00 Collecting covdefaults==2.3.0 Downloading https://pypi.devinfra.sentry.io/wheels/covdefaults-2.3.0-py2.py3-none-any.whl (5.1 kB) Collecting coverage==6.3.3 Downloading https://pypi.devinfra.sentry.io/wheels/coverage-6.3.3-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (212 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 212.2/212.2 KB 66.0 MB/s eta 0:00:00 Collecting croniter==1.3.10 Downloading https://pypi.devinfra.sentry.io/wheels/croniter-1.3.10-py2.py3-none-any.whl (18 kB) Collecting cryptography==39.0.1 Downloading https://pypi.devinfra.sentry.io/wheels/cryptography-39.0.1-cp36-abi3-manylinux_2_28_x86_64.whl (4.2 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.2/4.2 MB 70.4 MB/s eta 0:00:00 Collecting cssselect==1.0.3 Downloading https://pypi.devinfra.sentry.io/wheels/cssselect-1.0.3-py2.py3-none-any.whl (16 kB) Collecting cssutils==2.4.0 Downloading https://pypi.devinfra.sentry.io/wheels/cssutils-2.4.0-py3-none-any.whl (404 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 405.0/405.0 KB 77.2 MB/s eta 0:00:00 Collecting datadog==0.29.3 Downloading https://pypi.devinfra.sentry.io/wheels/datadog-0.29.3-py2.py3-none-any.whl (72 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 72.9/72.9 KB 24.6 MB/s eta 0:00:00 Collecting decorator==5.1.1 Downloading https://pypi.devinfra.sentry.io/wheels/decorator-5.1.1-py3-none-any.whl (9.1 kB) Collecting dictpath==0.1.3 Downloading https://pypi.devinfra.sentry.io/wheels/dictpath-0.1.3-py3-none-any.whl (8.4 kB) Collecting distlib==0.3.4 Downloading https://pypi.devinfra.sentry.io/wheels/distlib-0.3.4-py2.py3-none-any.whl (461 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 461.2/461.2 KB 89.3 MB/s eta 0:00:00 Collecting django==3.2.23 Downloading https://pypi.devinfra.sentry.io/wheels/Django-3.2.23-py3-none-any.whl (7.9 MB) ━━━━━━━━━━━━━━━━━━━━━━ 4.4/7.9 MB 212.7 MB/s eta 0:00:01 ERROR: Exception: Traceback (most recent call last): File "/home/runner/work/sentry/sentry/.venv/lib/python3.8/site-packages/pip/_vendor/urllib3/response.py", line 438, in _error_catcher yield File "/home/runner/work/sentry/sentry/.venv/lib/python3.8/site-packages/pip/_vendor/urllib3/response.py", line 519, in read data = self._fp.read(amt) if not fp_closed else b"" File "/home/runner/work/sentry/sentry/.venv/lib/python3.8/site-packages/pip/_vendor/cachecontrol/filewrapper.py", line 90, in read data = self.__fp.read(amt) File "/opt/hostedtoolcache/Python/3.8.16/x64/lib/python3.8/http/client.py", line 459, in read n = self.readinto(b) File "/opt/hostedtoolcache/Python/3.8.16/x64/lib/python3.8/http/client.py", line 503, in readinto n = self.fp.readinto(b) File "/opt/hostedtoolcache/Python/3.8.16/x64/lib/python3.8/socket.py", line 669, in readinto return self._sock.recv_into(b) File "/opt/hostedtoolcache/Python/3.8.16/x64/lib/python3.8/ssl.py", line 1241, in recv_into return self.read(nbytes, buffer) File "/opt/hostedtoolcache/Python/3.8.16/x64/lib/python3.8/ssl.py", line 1099, in read return self._sslobj.read(len, buffer) socket.timeout: The read operation timed out During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/runner/work/sentry/sentry/.venv/lib/python3.8/site-packages/pip/_internal/cli/base_command.py", line 167, in exc_logging_wrapper status = run_func(*args) File "/home/runner/work/sentry/sentry/.venv/lib/python3.8/site-packages/pip/_internal/cli/req_command.py", line 205, in wrapper return func(self, options, args) File "/home/runner/work/sentry/sentry/.venv/lib/python3.8/site-packages/pip/_internal/commands/install.py", line 339, in run requirement_set = resolver.resolve( File "/home/runner/work/sentry/sentry/.venv/lib/python3.8/site-packages/pip/_internal/resolution/resolvelib/resolver.py", line 94, in resolve result = self._result = resolver.resolve( File "/home/runner/work/sentry/sentry/.venv/lib/python3.8/site-packages/pip/_vendor/resolvelib/resolvers.py", line 481, in resolve state = resolution.resolve(requirements, max_rounds=max_rounds) File "/home/runner/work/sentry/sentry/.venv/lib/python3.8/site-packages/pip/_vendor/resolvelib/resolvers.py", line 348, in resolve self._add_to_criteria(self.state.criteria, r, parent=None) File "/home/runner/work/sentry/sentry/.venv/lib/python3.8/site-packages/pip/_vendor/resolvelib/resolvers.py", line 172, in _add_to_criteria if not criterion.candidates: File "/home/runner/work/sentry/sentry/.venv/lib/python3.8/site-packages/pip/_vendor/resolvelib/structs.py", line 151, in __bool__ return bool(self._sequence) File "/home/runner/work/sentry/sentry/.venv/lib/python3.8/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 155, in __bool__ return any(self) File "/home/runner/work/sentry/sentry/.venv/lib/python3.8/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 143, in return (c for c in iterator if id(c) not in self._incompatible_ids) File "/home/runner/work/sentry/sentry/.venv/lib/python3.8/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 47, in _iter_built candidate = func() File "/home/runner/work/sentry/sentry/.venv/lib/python3.8/site-packages/pip/_internal/resolution/resolvelib/factory.py", line 215, in _make_candidate_from_link self._link_candidate_cache[link] = LinkCandidate( File "/home/runner/work/sentry/sentry/.venv/lib/python3.8/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 288, in __init__ super().__init__( File "/home/runner/work/sentry/sentry/.venv/lib/python3.8/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 158, in __init__ self.dist = self._prepare() File "/home/runner/work/sentry/sentry/.venv/lib/python3.8/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 227, in _prepare dist = self._prepare_distribution() File "/home/runner/work/sentry/sentry/.venv/lib/python3.8/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 299, in _prepare_distribution return preparer.prepare_linked_requirement(self._ireq, parallel_builds=True) File "/home/runner/work/sentry/sentry/.venv/lib/python3.8/site-packages/pip/_internal/operations/prepare.py", line 487, in prepare_linked_requirement return self._prepare_linked_requirement(req, parallel_builds) File "/home/runner/work/sentry/sentry/.venv/lib/python3.8/site-packages/pip/_internal/operations/prepare.py", line 532, in _prepare_linked_requirement local_file = unpack_url( File "/home/runner/work/sentry/sentry/.venv/lib/python3.8/site-packages/pip/_internal/operations/prepare.py", line 214, in unpack_url file = get_http_url( File "/home/runner/work/sentry/sentry/.venv/lib/python3.8/site-packages/pip/_internal/operations/prepare.py", line 94, in get_http_url from_path, content_type = download(link, temp_dir.path) File "/home/runner/work/sentry/sentry/.venv/lib/python3.8/site-packages/pip/_internal/network/download.py", line 146, in __call__ for chunk in chunks: File "/home/runner/work/sentry/sentry/.venv/lib/python3.8/site-packages/pip/_internal/cli/progress_bars.py", line 304, in _rich_progress_bar for chunk in iterable: File "/home/runner/work/sentry/sentry/.venv/lib/python3.8/site-packages/pip/_internal/network/utils.py", line 63, in response_chunks for chunk in response.raw.stream( File "/home/runner/work/sentry/sentry/.venv/lib/python3.8/site-packages/pip/_vendor/urllib3/response.py", line 576, in stream data = self.read(amt=amt, decode_content=decode_content) File "/home/runner/work/sentry/sentry/.venv/lib/python3.8/site-packages/pip/_vendor/urllib3/response.py", line 541, in read raise IncompleteRead(self._fp_bytes_read, self.length_remaining) File "/opt/hostedtoolcache/Python/3.8.16/x64/lib/python3.8/contextlib.py", line 131, in __exit__ self.gen.throw(type, value, traceback) File "/home/runner/work/sentry/sentry/.venv/lib/python3.8/site-packages/pip/_vendor/urllib3/response.py", line 443, in _error_catcher raise ReadTimeoutError(self._pool, None, "Read timed out.") pip._vendor.urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='pypi.devinfra.sentry.io', port=443): Read timed out. ```

Code of Conduct

asottile-sentry commented 10 months ago

this is absolutely not the correct patch but it does satisfy my requirements for now

diff --git a/src/pip/_internal/network/download.py b/src/pip/_internal/network/download.py
index d1d43541e..4919f3e0a 100644
--- a/src/pip/_internal/network/download.py
+++ b/src/pip/_internal/network/download.py
@@ -7,6 +7,7 @@ import os
 from typing import Iterable, Optional, Tuple

 from pip._vendor.requests.models import CONTENT_CHUNK_SIZE, Response
+from pip._vendor.tenacity import retry, stop_after_attempt

 from pip._internal.cli.progress_bars import get_download_progress_renderer
 from pip._internal.exceptions import NetworkConnectionError
@@ -128,6 +129,7 @@ class Downloader:
         self._session = session
         self._progress_bar = progress_bar

+    @retry(reraise=True, stop=stop_after_attempt(5))
     def __call__(self, link: Link, location: str) -> Tuple[str, str]:
         """Download the file given by link into location."""
         try: