pypa / pip

The Python package installer
https://pip.pypa.io/
MIT License
9.38k stars 2.98k forks source link

Pip endlessly downloads all previous versions of python packages #12827

Open msharp9 opened 5 days ago

msharp9 commented 5 days ago

Description

I'm seeing this behavior again in the latest pip version. https://github.com/pypa/pip/issues/11928

Expected behavior

Recent versions of pip should avoid downloading a whole package just to get its metadata, see PEP 658.

pip version

24.1.1

Python version

Python 3.8, 3.9, 3.10, 3.11

OS

Linux

How to Reproduce

pip install awscli s3fs boto3

Output

No response

Code of Conduct

msharp9 commented 5 days ago

As a note those are just the python versions I tried it on.

Also as a note, when hitting PyPI it downloads a .whl.metadata file, but when I hit other mirrors like CodeArtifact it downloads the entire wheel.

notatallshaw commented 5 days ago

As you've noted the reason @jeanas closed https://github.com/pypa/pip/issues/11928 is because now pip should download the metadata of the packages, and not the whole package.

but when I hit other mirrors like CodeArtifact it downloads the entire wheel.

It is up to CodeArtifacts to implement the metadata file support on their end, once they do, pip will use it. I'm not familiar with this service, but perhaps raise an issue with them?

But yes, while this improves IO it doesn't stop pip from getting into this situation in the first place, of having to backtrack many versions of a package.

The situation should be significantly improved once pip prefers direct causes (which I have a draft PR here https://github.com/pypa/pip/pull/12499, but there is a lot of work still do before being able to land it).

msharp9 commented 5 days ago

Thank you for the information.

CodeArtifact is AWS's artifact store: https://aws.amazon.com/codeartifact/

notatallshaw commented 5 days ago

Btw, it would be remiss of me to mention that uv tends to handle resolution better than pip does and provides a pip like interface (uv pip install awscli s3fs boto3): https://github.com/astral-sh/uv

Even if you can't use uv directly in your workflow, you may find it's uv pip compile feature useful to identify good lower bounds of your different requirements. E.g. here is a resolution for your latest requirements across all platforms with a minimum Python of 3.8:

$ echo -e "awscli\n s3fs\n boto3" | uv pip compile - --annotation-style line --universal --python-version 3.8

# This file was autogenerated by uv via the following command:
#    uv pip compile - --annotation-style line --universal --python-version 3.8
awscli==1.33.21
boto3==1.34.139
botocore==1.34.139        # via awscli, boto3, s3fs, s3transfer
colorama==0.4.6           # via awscli
docutils==0.16            # via awscli
fsspec==2024.6.1          # via s3fs
jmespath==1.0.1           # via boto3, botocore
pyasn1==0.6.0             # via rsa
python-dateutil==2.9.0.post0  # via botocore
pyyaml==6.0.1             # via awscli
rsa==4.7.2                # via awscli
s3fs==0.4.2
s3transfer==0.10.2        # via awscli, boto3
six==1.16.0               # via python-dateutil
urllib3==2.2.2 ; python_version >= '3.10'  # via botocore
urllib3==1.26.19 ; python_version < '3.10'  # via botocore

You may want to use those versions of awscli, s3fs, boto3, as your lower bounds, you may also find specifying a urllib3 requirement quite helpful as well, e.g. urllib3<2 ; python_version < '3.10' and urllib3>=2 ; python_version >= '3.10'

Also, if you need to support a wide range of awscli, s3fs, and boto3, uv also comes with a flag --resolution which allows you to control whether uv picks newer or older dependencies. In particular selecting --resolution lowest-direct will pick the oldest for your specified dependencies and the newest for transative dependencies, you can then experiment with lower bounds on your direct dependencies until you find ones that install and work.

msharp9 commented 5 days ago

Great call out. I'm actually a fan of uv and use it for personal projects. I actually have another issue in that repo to get better support for codeartifact and other indexes. https://github.com/astral-sh/uv/issues/1404