jazzband / pip-tools

A set of tools to keep your pinned Python dependencies fresh.
https://pip-tools.rtfd.io
BSD 3-Clause "New" or "Revised" License
7.67k stars 608 forks source link

Backtracking tries to download all versions of package #2044

Open mireq opened 7 months ago

mireq commented 7 months ago

Let's assume, i have already this requirements.txt file:

urllib3==2.1.0

Now i add boto3 to requirements.in:

urllib3
boto3

Now command pip-compile --resolver=backtracking -v requirements.in tries to download every metadata file or every package file (if i am using pip servers directly, not caching servers on my local network):

pip-compile --resolver=backtracking -v requirements.in 
Using indexes:
  https://pypi.org/simple

                          ROUND 1                           
  Collecting urllib3 (from -r requirements.in (line 1))
    Obtaining dependency information for urllib3 from https://files.pythonhosted.org/packages/96/94/c31f58c7a7f470d5665935262ebd7455c7e4c7782eb525658d3dbf4b9403/urllib3-2.1.0-py3-none-any.whl.metadata
    Using cached urllib3-2.1.0-py3-none-any.whl.metadata (6.4 kB)
  Collecting boto3 (from -r requirements.in (line 2))
    Obtaining dependency information for boto3 from https://files.pythonhosted.org/packages/e3/f7/93a4ba1cd2cc4ee95f871b0890e4ed60e52365110a074e7265279750a736/boto3-1.34.18-py3-none-any.whl.metadata
    Using cached boto3-1.34.18-py3-none-any.whl.metadata (6.6 kB)
  Collecting botocore<1.35.0,>=1.34.18 (from boto3->-r requirements.in (line 2))
    Obtaining dependency information for botocore<1.35.0,>=1.34.18 from https://files.pythonhosted.org/packages/ec/79/cc5e52bfc3cf7c26ba452c348fe6a765a888730b692d783e64a175243572/botocore-1.34.18-py3-none-any.whl.metadata
    Using cached botocore-1.34.18-py3-none-any.whl.metadata (5.6 kB)
  Collecting jmespath<2.0.0,>=0.7.1 (from boto3->-r requirements.in (line 2))
    Using cached jmespath-1.0.1-py3-none-any.whl (20 kB)
  Collecting s3transfer<0.11.0,>=0.10.0 (from boto3->-r requirements.in (line 2))
    Obtaining dependency information for s3transfer<0.11.0,>=0.10.0 from https://files.pythonhosted.org/packages/12/bb/7e7912e18cd558e7880d9b58ffc57300b2c28ffba9882b3a54ba5ce3ebc4/s3transfer-0.10.0-py3-none-any.whl.metadata
    Using cached s3transfer-0.10.0-py3-none-any.whl.metadata (1.7 kB)
  Collecting python-dateutil<3.0.0,>=2.1 (from botocore<1.35.0,>=1.34.18->boto3->-r requirements.in (line 2))
    Using cached python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)
  INFO: pip is looking at multiple versions of botocore to determine which version is compatible with other requirements. This could take a while.
  Collecting boto3 (from -r requirements.in (line 2))
    Obtaining dependency information for boto3 from https://files.pythonhosted.org/packages/3b/34/0bdbd20d688438a46f8e9255e9e9a06ef350689a05e9a7233babff554978/boto3-1.34.17-py3-none-any.whl.metadata
    Using cached boto3-1.34.17-py3-none-any.whl.metadata (6.6 kB)
  Collecting botocore<1.35.0,>=1.34.17 (from boto3->-r requirements.in (line 2))
    Obtaining dependency information for botocore<1.35.0,>=1.34.17 from https://files.pythonhosted.org/packages/4a/ed/9f3ec1754d1a444997d7d18aeaef2611a3694783b6baf062c2e74627c4b3/botocore-1.34.17-py3-none-any.whl.metadata
    Using cached botocore-1.34.17-py3-none-any.whl.metadata (5.6 kB)
  Collecting boto3 (from -r requirements.in (line 2))
    Obtaining dependency information for boto3 from https://files.pythonhosted.org/packages/8b/dc/26c1c654cb6a177fc0b7ca7f916cd61daf045a42ca091fce44906d65be9f/boto3-1.34.16-py3-none-any.whl.metadata
    Using cached boto3-1.34.16-py3-none-any.whl.metadata (6.6 kB)
  Collecting botocore<1.35.0,>=1.34.16 (from boto3->-r requirements.in (line 2))
    Obtaining dependency information for botocore<1.35.0,>=1.34.16 from https://files.pythonhosted.org/packages/6d/84/36a78ba9d992baf3ed48dc0ad2bdb711d27033de0b088d71f4cfcd698bde/botocore-1.34.16-py3-none-any.whl.metadata
    Using cached botocore-1.34.16-py3-none-any.whl.metadata (5.6 kB)
...

In this case it takes hours of downloading.

From persective of resolver, is correct to try other versions.

I don't know if this can be solved in some smart way. I might suggest to not check older versions if installed package version is already higher than maximal version defined in botocore (in this case), but this would not be correct.

This is more discussion topic, than real bug report, because i think, pip-tools behaves correctly. Maybe best solution would be extend pip server metadata to include all dependencies of package on single request and correctly grouped to version ranges, so it would be not one gigantic file. Instead of this, it would contain sections like '>=1.0,<1.2': {'dependencies': ....

webknjaz commented 7 months ago

I sometimes add a separate file with extra constraints at the top of the input file. As in -c broken-version-constraints.txt where I have some of the transitive deps tightened up. Maybe, it'll work for you too.

mireq commented 7 months ago

@webknjaz i have script which temporary removes problematic dependencies. They are automatically added with correct version in dependency build process. I can solve this problem easily when i know what exactly is the problem, but it's not always so obvious.

I have solved this problem for my case. My intent is to start discussion to make pypi package registry more efficient.

Let's look how fast can npm resolve packages. It's because npm registry sends package metadata for all versions in one request. This is good start. I would like to see version grouping, so if some range has same dependencies, then it is not necessary to send that same list for all versions.

webknjaz commented 7 months ago

First of all, pip-tools is not a place to discuss how indexes work. Second, there's package metadata that is served per-version already, it's standardized, there's a PEP. And it's implemented in Warehouse. Finally, it's simply impossible for the PyPI to know all the metadata, because sdists have dynamic nature and may produce different dependencies in different cases. Study https://dustingram.com/articles/2018/03/05/why-pypi-doesnt-know-dependencies/ for details.

You can force pip to disregard sdists using --only-binary and --prefer-binary CLI options, that can be passed from pip-tools too. But that's about it.

mireq commented 7 months ago

@webknjaz Yes, metadata is served per version, this is part of the problem. NPM for example serves all dependencies for packages in single request. Instead of this, pip-tools (pip) needs hundreds of request just to check package dependencies (this is in my first message).

Second problem is something, which should be eliminated in future. It was really bad decision to allow code execution even for mostly static metadata.

I don't know if this is best place to discuss this, but it's problem which directly affects pip-tools. It can't be properly fixed by pip-tools, but it's possible to reduce backtracking depth, maybe try lower dependents versions eagerly instead of just trying downgrade first problematic package to any old version.

webknjaz commented 7 months ago

It's not going to be possible to eliminate building sdists. Like ever. The reasons are explained in the article. It'd simply break the ecosystem.

If you want to draft a PR with more concrete ideas, you can try. But I don't see anything actionable here.