pip-compile doesn't look to older versions for compatibility

dan-passaro commented 8 years ago

If package A is pinned to an old version, a second package B is unpinned, and the newest version of B requires a newer version of A, pip-compile fails even though it could succeed by using an older version of B.

Steps to replicate

$ cat <<EOF > requirements.in
> Django~=1.4.0  # Yeah I know, we're working on updating this :)
> django-debug-toolbar
> EOF
$ pip-compile

Expected result

$ cat requirements.txt
Django==1.4.22
django-debug-toolbar==1.3.2
sqlparse==0.2.1         # via django-debug-toolbar

Actual result

Could not find a version that matches Django>=1.8,~=1.4.0
Tried: [snip]

Groxx commented 8 years ago

heh. I just ran into this yesterday, though mine was a couple dependencies deep. Your example is waaay smaller than mine.

I'm not even sure what a solution would look like. If there are multiple candidates to backtrack, what's rational behavior? To backtrack one until it runs out of options, then try reducing the next level up by one version and retrying all bottom-most versions? Or maybe a more breadth-first approach would yield results quicker (backtrack the bottom-most once, then try the next level up once, etc)?

dan-passaro commented 8 years ago

For multiple levels of depth, I would think you can bubble it up... E.g. if you have A, and then X which depends on Y which depends on Z, and the latest version of Z needs a newer version of A, you roll back Z if allowed (e.g. if Y's dependency is something like Z>=some_version). If you find a Z that works then you're done, if that doesn't work then that version of Y is not compatible, so roll back to the next earliest version of Y and try it all over again and so on. If all versions of Y fail, then roll back to the next earliest version of X and start all over again. If you're out of earlier versions of X then it's a total failure, and probably an erroneous file/request (or maybe some old package got deleted somewhere).

(In typing this out I think it's basically the first thing you said.)

jbaum-cmcrc commented 7 years ago

I have a different, probably much more common usage scenario which would be solved by the same feature.

Have five packages, A, B, C, D and E, which all use pip-compile.
- A depends on B, C, D. C and D depend on E.
- A -> B A -> C -> E A -> D -> E

Initial situation:

A/requirements.txt:
  B==1.0
  C==1.0
  D==1.0
  E==1.0

There is a new version of E, 1.1.
Run pip-compile on package C, integration tests pass, release C version 1.1 which depends on E==1.1
Run pip-compile on package D, integration tests fail; no new version released, D still depends on E==1.0
There is now a new version of package B, 1.1.
Run pip-compile on package A: Actual result:
- pip-compile fails as the latest versions of packages C and D have conflicting requirements for package E.
Expected result:
- pip-compile succeeds, choosing the latest version of B but versions 1.0 of all the other packages (the most up-to-date consistent configuration of versions).
```
A/requirements.txt
B==1.1
C==1.0
D==1.0
E==1.0
```

Impact

This is particularly important when package B is frequently-changing, while the dependency graph below C and D is more complex with many opportunities for integration test failures and/or relatively slow integration tests.

jbaum-cmcrc commented 7 years ago

Optimisation of algorithm to make it practical.

As written by @dan-passaro (and implied by the issue description), pip-compile would at times end up downloading and analysing large parts of the history of many packages. This would be prohibitive.

To avoid this, pip-compile can use the previously-compiled requirements.txt of the current package to prune the search; in most cases, none of the requirements.in files has changed, so the previous requirements.txt is a consistent, satisfactory configuration of versions — we're just checking to see if an upgrade is possible.

This would cut down the number of versions to be checked considerably; in most cases, there would only be a handful to check, often just one or two (the one in the previous requirements.txt and the latest version, if different).

To handle new pins, this pruning would be applied only to the "try an older version of Z" logic, not to any versions specified in any requirements. This would not handle new pins completely, but it would handle them sufficiently in practical use.

In the scenario of multiple packages using pip-compile and uneven propagation of upgrades, this optimisation would work perfectly.
In the scenario of multiple packages using pip-compile and a new pin in one of the dependencies, this optimisation would work OK, although if multiple packages depend on the package to be pinned, they would all have to be pinned consistently. This already needs to be done, so it's not a regression.
For a new pin in the package to be compiled, a manual step would be required; the user would have to specify a solution once. Once a solution was specified, pip-compile would resume working normally, upgrading from the manually-specified solution to the latest possible solution. When the pin is removed, no manual intervention would be needed and pip-compile would upgrade to the latest version.

While requiring a manual step is a disadvantage, I can't see any practical way to avoid it; it simply doesn't seem to be plausible to potentially download the entire history of the dependencies in order to analyse the graph. It's also only required in limited circumstances (when adding a new pin, and then only in the package where the new pin is being added).

For packages not mentioned in the previous requirements.txt (or if there is no previous requirements.txt), no roll back would be attempted, effectively falling back to the current algorithm.

dan-passaro commented 7 years ago

I haven't done too much research and don't know how accurate this comment is, but it seems like this problem would be much easier to solve if PyPI published dependency information similar to how RubyGems does. Does Bundler not handle this scenario more gracefully?

vphilippon commented 7 years ago

I've been keeping an eye on this issue and did a bit of analysis on my end too.

With the example given so far, we act a bit like as if we know which package changed and needs to be backtracked. If we have that information, it's not too hard (and I'd even propose and send a PR for an option to backtrack, even if it would require multiple downloads, I'd use it gladly).

I've found that a big part of the issue is knowing which package to backtrack. If X, Y and Z all directly depend on A, which one to we backtrack first? We also need to have a mechanic to start back at our "backtrack starting point", because if we backtrack X for example and find nothing, we'll want to restore the constraints that we had before backtracking, before trying with Y.

This is getting complex really quick. Solving this is still complex, even if we had dependency informations published by PyPI. Although if there's a reasonable way to do this in terms of algorithmic complexity, I'm pretty sure we can poke the people in the PyPa community to have a reasonable way to get dependency informations. After all, pip is trying to get dependency resolving too, so they'll likely hit that point eventually.

@dan-passaro I don't know much about Ruby. Are there "full dependency resolving tools" on their side? Do they have the constraint of "one version of a package/gem/thing installed at any given time"?

dan-passaro commented 7 years ago

@vphilippon yes, Ruby is like Python in that all dependencies are stored in a flat toplevel, compared to Node where each dependency gets its own tree of sub-dependencies.

Browsing the RubyGems site (similar to PyPI), you can see that the site itself has structured data describing the dependencies, e.g. on the page for the Capistrano gem you can see right on the page it depends on 4 gems at runtime and 4 gems for development, with version range specifiers.

I haven't researched this claim but my hunch is that Bundler (similar to pip) can somehow query this data and use it to make choices about what versions to select. If that's true, I would expect Bundler to handle this problem better than pip-compile does currently.

jbaum-cmcrc commented 7 years ago

Having pypi publish dependency information separately would indeed help; that would eliminate the concern with the volume of downloading.

We would still have the problem of solving for the best solution and algorithmic complexity. I wouldn't be surprised if the problem turns out to be NP-complete (that is, the question of whether there is a solution). That doesn't rule out reasonable performance in practice, but it does make it more difficult to implement. Likely different heuristics would do well in different situations, and we would have to find ones that work well in typical use.

My proposed optimisation would work by restricting the number of package versions to be considered when searching in the history, then brute-forcing through the rest. That would likely be satisfactory in many common scenarios (or, at least, in our own real scenario), since there would only be a handful of versions to consider.

The scenario in the original post would likely be easy to solve, since there are only two top-level packages and one of them has an explicit version pin. It's therefore either obvious that it's the django-debug-toolbar that needs to be rolled back (since its dependencies cannot match the explicitly-specified ~=1.4.0) or there are only a few versions of Django ~= 1.4.0 to check.

Likely we would be able to develop a solver which would work well in most circumstances.

A minor issue would be defining which of the possible solutions should be returned when there is more than one. In most cases, there will be one clear best solution. In the remaining cases, likely any of them can be returned; ideally the same one each time, to avoid spurious changes in the compiled requirements.txt (which would then trigger spurious CI builds and interleaved package versions which alternate upgrading different dependencies). That would require either a deterministic algorithm, or an explicit last-stage comparison with the previously-compiled requirements.txt so that the previous solution is returned if the new solution is no better.

Behoston commented 3 years ago

Same problem with:

python-glanceclient==2.15.0
cryptography==2.4.2

glacier requires Pyopenssl>=17.0, but 20.0 picked as best candidate Pyopenssl 20.0 requires cryptography>=3.x

atugushev commented 2 years ago

Fixed with the new resolver from #1539:

Details

``` ❯ pip-compile --resolver backtracking # # This file is autogenerated by pip-compile with python 3.8 # To update, run: # # pip-compile --resolver=backtracking # django==1.4.22 # via # -r requirements.in # django-debug-toolbar django-debug-toolbar==1.3.2 # via -r requirements.in sqlparse==0.4.2 # via django-debug-toolbar ```

atugushev commented 2 years ago

This has been fixed in #1539 with the backtracking resolver, try pip-compile --resolver backtracking. The resolver is released as part of pip-tools v6.8.0. Please let us know if it doesn't resolve your issue. Thanks!

jazzband / pip-tools