pypa / pip

The Python package installer
https://pip.pypa.io/
MIT License
9.52k stars 3.03k forks source link

pip needs a dependency resolver #988

Closed cboylan closed 3 years ago

cboylan commented 11 years ago

pip's dependency resolution algorithm is not a complete resolver. The current resolution logic has the following characteristics:

NOTE: In cases where the first found dependency is not sufficient, specifying the constraints for the dependency on the top level can be used to make it work.

pip install project "dependency>=1.5,<2.0"

(2019-06-23)

This is being worked on by @pradyunsg, in continuation of his GSoC 2017 project. A substantial amount of code cleanup has been done, and is ongoing, to make it tractable replace the current resolver, in a reasonable manner. This work enabled pip >= 10 to warn when it is going to make an installation that breaks the dependency graph. (The installations are not aborted in such scenarios, for backwards compatibility.)


(2019-11-29)

A status update regarding this is available here.


(2022-12-16)

See the closing note for details.

srkunze commented 9 years ago

I think users would prefer an error over what happens now, where conflicts are quietly ignored. With an error, a user could respond and add additional constraints to solve the conflict

+1, I can't count the number of times invalid dependencies have been put together because an invalid situation was ignored.

I agree. Errors should never pass silently.

We would love a solution to this issue: also cf. https://github.com/pypa/pip/issues/3183

xyzza commented 8 years ago

Fellows, may we use the NuGet v3 dependency resolution approach? It's better than nothing, isn't it? It's seems to be viable..

astrojuanlu commented 8 years ago

I think this is what the conda guys use: https://github.com/ContinuumIO/pycosat

Ivoz commented 8 years ago

@Juanlu001 unfortunately it's not pure python

DemiMarie commented 8 years ago

One issue is that any pure-python solution will be slow, except perhaps if we are running on PyPy.

However, it seems to me that the better solution is to allow the same package to have two different versions installed, eliminating conflicts.

FichteFoll commented 8 years ago

allow the same package to have two different versions installed, eliminating conflicts.

Due to how importing in Python works, I believe this to actually be possible by providing a dynamic Loader for sys.meta_path, which checks for the package's specified dependencies in order to load the correct version, and enabling that for all modules/packages installed via pip. I do not know how one would accomplish a more or less semi-permanent modification of sys.meta_path however.

rbtcollins commented 8 years ago

@drbo the complexity is O(n^e) - so exponential in the size of the graph: CPython performance vs pypy vs C is linear. Further the constant cost of examining a package is high (network transfer of the archive), so yeah - Python performance is insignificant.

Ivoz commented 8 years ago

However, it seems to me that the better solution is to allow the same package to have two different versions installed, eliminating conflicts.

Not great.

What happens when liba v1.2 emits a dictionary with 3 fields, but libg requires v1.4 of liba which has added an extra field in the dictionary? If these are all allowed to mix in the same runtime, then generally a developer will bump into cryptic errors after doing a normal update of dependencies. Telling them you just won't run two different versions of the same library at the same time eliminates this class of runtime bugs.

RonnyPfannschmidt commented 8 years ago

also python does not support importing more than one version of the same code

remram44 commented 8 years ago

What happens when liba v1.2 emits a dictionary with 3 fields, but libg requires v1.4 of liba which has added an extra field in the dictionary?

The approach here wouldn't be to have each library have a private version of its dependencies, and bring the kind of madness only the Node.js world could tolerate, but to do dependency resolution out of the installed package versions when a program is started. I believe this is the setuptools way, this is why the setuptools-generated entry scripts look like this:

#!python
# EASY-INSTALL-ENTRY-SCRIPT: 'reprounzip==1.0.4','console_scripts','reprounzip'
__requires__ = 'reprounzip==1.0.4'
import sys
from pkg_resources import load_entry_point

if __name__ == '__main__':
    sys.exit(
        load_entry_point('reprounzip==1.0.4', 'console_scripts', 'reprounzip')()
    )

The point of pkg_resources was always to do all kinds of path/packages manipulation but most of that was never widely used.

DemiMarie commented 8 years ago

One approach would be to isolate multiple versions from each other in the same process, but mutable globals keep that from working in Python. It might work in Haskell or Rust, but not in Python. On Apr 7, 2016 5:02 AM, "Matt Iversen" notifications@github.com wrote:

However, it seems to me that the better solution is to allow the same package to have two different versions installed, eliminating conflicts.

Not great.

What happens when liba v1.2 emits a dictionary with 3 fields, but libg requires v1.4 of liba which has an extra field in the dictionary? If these are all allowed to mix in the same runtime, then generally a developer will bump into cryptic errors after doing a normal update of dependencies. Telling them you just won't run two different versions of the same library at the same time eliminates this.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/pypa/pip/issues/988#issuecomment-206769978

dstufft commented 8 years ago

What pkg_resources does, doesn't actually remove the need for a dependency resolver. It just pushes the need from an install time need to a runtime need (e.g. if you dynamicalyl adjust the sys.path to add the needed version you have to resolve which one of the installed versions you should add, taking into account all of the version specifiers that may be required). This is a bit of a narrower field since you're unlikely to have every version of every library installed, but on the flip side it means that we inflict our own dependency resolver on other packaging systems that otherwise have no need for them (like when installing with apt-get), nevermind the fact that pkg_resources is incredibly slow because it has to issue a ton of stat calls and traverse the entire sys.path looking for installed files, a cost that you have to pay on every entry point using script.

leohowell commented 8 years ago

Similar issue with @dracos

  1. A==1.0 depends on B==1.0 and C(no specific version)

    B==1.0 depends on C==1.3

  2. In a python environment that already had package C==1.0 installed.
  3. After install A:
    A==1.0

    B==1.0

    C==1.0

  4. C won’t be update to 1.3
odyssey4me commented 8 years ago

Just to add to the conversation, and for reference to anyone else finding this issue, if you specify a pin in requirements and something else in constraints, the requirement will be ignored and only the constraint used. I've shown an example here: https://gist.github.com/odyssey4me/d8acc9888cc206818e21e059f28b3576

dstufft commented 7 years ago

4182 has an interesting sub-problem, specifically related to build dependencies.

ghost commented 7 years ago

Enthought has been moved to enthought/sat-solver and has advanced significantly.

westurner commented 7 years ago
ghost commented 7 years ago

@westurner The main blocker is that requirements cannot be downloaded reliably from pypi.

westurner commented 7 years ago

@xoviat

@westurner The main blocker is that requirements cannot be downloaded reliably from pypi.

In terms of? https://status.python.org/

Would it be helpful to cache the catalog [JSON] dependency metadata?

ghost commented 7 years ago

Would it be helpful to cache the catalog [JSON] dependency metadata?

Now you are onto it. What if pip downloads a package and then discovers that it downloaded the wrong package? This may happen hundreds of times with a dependency solver enabled. The only reliable solution is to be able to fetch metadata without downloading the entire package. But currently the API that can be used to do this doesn't work half the time (possibly exaggerating, but still generally correct).

ghost commented 7 years ago

To be clear, the API "doesn't work" not because it's "down" but because it only works for particular packages.

cournape commented 7 years ago

One issue is that any pure-python solution will be slow, except perhaps if we are running on PyPy.

We did not investigate substantially, but in our tests, we found our own simplesat to be ~ as fast as conda's solver using picosat. It is definitely too slow for cases where you have 1000s of packages installed, but those are rather rare I think.

ghost commented 7 years ago

The issue is definitely not the SAT solver.

pradyunsg commented 7 years ago

Also some good notes on sat solving on their page, and an awesome email.

@Ivoz The link to the email is broken. Is there some other place I could find it?

andrewgodwin commented 7 years ago

Is there no interest in at least adding something that errors in the current situation where pip will happily install broken sets of dependencies? I agree a perfect solution would be lovely, but right now pip will silently create broken installations because it seemingly just grabs the first dependency restriction and uses that.

I'd be happy to contribute a patch that implements correct stop-and-error behaviour (maybe behind a CLI flag if there's backwards-compat issues) so that the current implementation can at least stop making broken installs, as a stepping-stone to having a full solver for the dependency constraint set. Otherwise, we're going to have to find some way to hook a dependency checker into the installation process ourselves for all our internal packages and it's going to get messy.

pradyunsg commented 7 years ago

@andrewgodwin There is #3787 that tracks the same. I do want to work on this but can't because of lack of time right now...

I'd be happy to contribute a patch that implements correct stop-and-error behaviour

Please do. :smile:

dstufft commented 7 years ago

@andrewgodwin Are you aware of pip check?

andrewgodwin commented 7 years ago

I was not - I'll look into it tomorrow, but it sounds like it still needs to be run separately after an install is completed?

dstufft commented 7 years ago

Yes it does.

awwad commented 7 years ago

I think that the ideal place to stop-and-error is before the installation occurs and after the requirement set is finalized. I have detection for dependency conflicts already set up at this point in an old experimental fork of 8.0.0.dev0 pip here. The code's not fit to use, but it would probably help serve as a guide if someone wanted to do this in production pip.

The larger project accompanying that fork is here, so that might be helpful reading. (Incidentally, draft results are here.)

pradyunsg commented 7 years ago

~Another issue that'll depends on this to be implemented - #53.~

Edit: As it turns out, maybe not?

atrigent commented 7 years ago

What a fucking joke.

dstufft commented 7 years ago

@atrigent Behave respectfully or leave.

cyli commented 7 years ago

From a discussion in IRC, currently the use case of passing an option through to a dependency in extras_require also doesn't work:

If in setup.py we have something like:

install_requires=["package"]
...
extras_require={"coolextra": ["package[coolextra]"]}

When I pip install -e .[coolextra], it just installs the package without the coolextra option.

Just wanted to mention it here so that it's tracked, so whatever solution that fixes this issue will also fix this. Alternately, if extras_require has some way of specifying "if no options are specified" or "if this particular option is not specified" it might be solvable by specifying:

extras_require = {
    "!coolextra": ["package"],
    "coolextra": ["package[coolextra]"],
}

Update: actually apologies, reading through all the linked issues more carefully, it seems like it'd be the same issue as https://github.com/pypa/pip/issues/4391.

pradyunsg commented 7 years ago

I'll be working toward resolving this issue over this summer, under the mentorship of @dstufft and @JustinCappos, as a part of GSoC 2017.

For anyone who's interested:

PS: I just realized no one had mentioned this here. I took the liberty of doing it myself - hopefully no one minds.

MarSoft commented 7 years ago

Not sure if that was mentioned somewhere above, but I could not find it. The solution for this bug which hopefully will appear somewhere in the future should also handle the following scenario: Currently, when certain package is referenced in several places of dependency tree, the first encountered reference will win. And all [extra]s attached to it will also win. I am sure there should be a mechanism of combining these extras. So what I want to stress here is that while it might be hard to resolve version conflicts between several occurrences of the same package (and sometimes impossible to find "fits-all" solution), it is trivial to handle extras - just merge them.

pradyunsg commented 7 years ago

Hi @MarSoft!

Combining extras is, indeed, the right behaviour here. Currently, this behaviour change is being tracked in #4391. It's been mentioned in #4653, #4664 and also #3903 was the original tracking issue for combining extras IIRC.

As of right now, I'm rewriting the dependency resolution logic of pip and this is one of the situations that would get fixed as a part of that. :)

greysteil commented 7 years ago

Looks like @pradyunsg just updated a blog to let everyone know he's coming into the final stretches. I'm rooting for you @pradyunsg, as I'm sure is everyone else on this thread!

https://pradyunsg.github.io/gsoc-2017/08/14/final-lap/

💪

pradyunsg commented 7 years ago

Woah! Thanks for some extremely motivating words @greysteil! ^>^

ghost commented 7 years ago

@pradyunsg Are you still working on this? Is there an ETA? No rush but I would like to use the resolver API in a PR.

pradyunsg commented 7 years ago

I went AWOL on this.

Are you still working on this?

Yep. Very much so. :)

I would like to use the resolver API in a PR.

Well, I'd say you wait for a while. There's one open PR that changes the Resolver API (#4636) so that should drive my point home, that the "API" hasn't really settled down yet.

ignatenkobrain commented 7 years ago

If you are going to go with SAT solver (which I really recommend to), you could use https://github.com/openSUSE/libsolv, it is used by package managers of various Linux distributions such as openSUSE/SUSE (zypper), Fedora (dnf), Mageia (dnf).

P.S. sorry, I didn't have time to read full story behind.

ghost commented 7 years ago

Not helpful.

pradyunsg commented 7 years ago

Hi @ignatenkobrain! :)

Thanks for the suggestion but...

piotr-dobrogost commented 7 years ago

it requires the execution of setup.py which is a potentially non-deterministic

I'm highly intrigued. Can you give example of non deterministic setup.py ?

remram44 commented 7 years ago

@piotr-dobrogost: please consider having discussions about how Python works off this ticket, a lot of people are tracking it for updates on the feature.

aehlke commented 7 years ago

I'm highly intrigued. Can you give example of non deterministic setup.py ?

Since it's just python, developers can do whatever they like within it. A common example is having the version number live in some other module and having setup.py import that - I've personally seen this not result in different setup.py output but rather in failure, where it ends up importing more code than intended in certain cases and caused setup.py to error out.

awwad commented 7 years ago

I'm highly intrigued. Can you give example of non deterministic setup.py ?

I can see why one might want to say "non-deterministic", but that can be misleading. Non-static is what @pradyunsg means, I think?

Assuming the adjusted question is of interest: For example, some time ago, I mirrored PyPI locally and ran a few test installations for each of several hundred thousand package versions. For 16% of the distributions, installation in Python2 resulted in different installed dependencies or dependency versions than installation in Python3, in an otherwise identical environment. setup.py can do whatever it wants, and chained dependencies result in a lot of differences.

(I'm not sure where else to continue the conversation, by the way - the distutils mailing list?)

ghost commented 7 years ago

Folks, this is not the place to discuss this. You can downvote me all you want, but the fact is that this issue is for discussing the dependency resolver, not Python behavior.

(I'm not sure where else to continue the conversation, by the way - the distutils mailing list?)

Yes, that would be the perfect place to discuss.

pradyunsg commented 6 years ago

Hey everyone! Hope you've had a great start to 2018!

This issue has been quiet for a while now and I have stuff worth telling. :)

There's a branch of pip that is capable of printing a warning when an incompatible set of packages is going to be installed. I'm not completely sure if the UX is good enough, so I'm seeking input on this behaviour; whether it is indeed something that would be an improvement over the status quo, what can be potential improvements to it etc. I'd appreciate any feedback on it over on https://github.com/pradyunsg/pip/issues/1.

As for proper resolution, all the preparatory refactoring that was needed on pip's side is basically done (it's what made the validation stuff mentioned above much easier to do). It was much more work than I'd anticipated initially.

I'm working on the resolver separately now and will bring it into pip once it's ready. What has been implemented currently already handles whatever I've thrown at it yet, including interesting situations with cycles and optional dependencies (extras). If you have a situation where pip 9's resolver does the wrong thing or causes issues, I'd really appreciate it if you could file an issue over at the zazo, detailing the project and versions involved to help make sure that zazo's resolver will Do The Right Thing™ (especially if there's a large number of projects and versions).

Cheers!

edit: Look, 100th comment!