prefix-dev / rip

Solve and install Python packages quickly with rip (pip in Rust)
https://prefix.dev
BSD 3-Clause "New" or "Revised" License
645 stars 21 forks source link

Make `urllib` and `botocore` conflict resolve faster #191

Open wolfv opened 7 months ago

wolfv commented 7 months ago

When resolving botocore and urllib it can happen that rip chooses the highest version of urllib3 (2.2) and then tries to resolve botocore.

However, botocore has an upper bound for urllib >=1.5,<2.1 or something like that.

Now the logical thing to do would be to lock in botocore and try older (compatible) versions of urllib3. Instead, rip tries all versions of botocore in existence (which are many) until it hits an outdated, broken sdist and crashes :'(

I think we should have some logic that implicitly adds the upper bound of any higher version to the lower versions and therefore dismisses urllib3 2.2 for any version of botocore (because it doesn't really make sense that a previous version of botocore would be compatible with a later urllib3).

notatallshaw commented 7 months ago

This is an intuitive optimization that makes sense in the real Python ecosystem, but you have to be a bit careful as it's definitely not true in general.

Let's say the candidate foo 2.1 depends on bar<2 and the current candidate of bar is bar 2.0, in a world where all dependencies are random then there's no reason to think that foo 2.0 depends on bar<2 or bar at all, further it could depend on bar>2.0, or even bar==2.0.

But I would highlight two situations where it might trip up that are likely to be encountered in real world situations with real motivation:

  1. When foo finds that a new release of bar is incompatible a new release will may an upper bound on bar, now technically older versions of foo do not exclude newer versions of foo, in general this optimization would be good for this situation but if enforced strictly would produce an impossible resolution when not strictly true according to the metadata
  2. When foo previously doesn't depend on bar but adds it as a new dependency with an upper bound, for whatever reason, while it might still make sense to prefer backtracking on bar in this situation, but it would definetly be valid to go back 1 version on foo.

All that said, this general idea has been on my mind for Pip resolution for a while now. But I was unsure how to efficently implement it. I am working on a new API between resolvelib and Pip to make it possible to efficently compare multiple candidates at the same time, and this post made me realize this is probably going to be a good optimization using the new API.

notatallshaw commented 7 months ago

rip and uv have very similiar performance resolution characteristics, and in this case pip suffers from the same issues. As such I actually put some work in to this today.

I already repeated this on the uv issue, I created a very hacky experimental proof of concept branch on pip side that implements the idea of preferring upper bounded requirements: https://github.com/pypa/pip/compare/main...notatallshaw:pip:backjump-to-upper-bounded

For the set of requirements click >= 7.0 click-loglevel ~= 0.2 dandi >= 0.24.0 psutil ~= 5.9 pyyaml selenium on midnight of 1st October 2023 it reduces the number of packages pip has to collect backtracking down from 831 to 111, on my machine it reduces the amount of time pip has to resolve from ~6 mins 20 seconds to ~1 min 20 seconds (using pypi-timemachine).

To get this to work for resolvelib I had to add a completely new API, which substantially changes the path of the algorithm, and it would need to be carefully checked it can still soundly resolve any given requirement, so I don't expect I'll ever be able to contribute this to either the resolvelib or pip project.

Also, I did have to hack resolvelib into understanding a bit about extras, as part of the core issue here is both the requirement on urllib3 and urllib3[socks]. Not sure how either rip or uv handles extras and optimizations in resolutions.

Hope this info helps.