pypa / pip

The Python package installer
https://pip.pypa.io/
MIT License
9.55k stars 3.04k forks source link

pip needs a dependency resolver #988

Closed cboylan closed 4 years ago

cboylan commented 11 years ago

pip's dependency resolution algorithm is not a complete resolver. The current resolution logic has the following characteristics:

NOTE: In cases where the first found dependency is not sufficient, specifying the constraints for the dependency on the top level can be used to make it work.

pip install project "dependency>=1.5,<2.0"

(2019-06-23)

This is being worked on by @pradyunsg, in continuation of his GSoC 2017 project. A substantial amount of code cleanup has been done, and is ongoing, to make it tractable replace the current resolver, in a reasonable manner. This work enabled pip >= 10 to warn when it is going to make an installation that breaks the dependency graph. (The installations are not aborted in such scenarios, for backwards compatibility.)


(2019-11-29)

A status update regarding this is available here.


(2022-12-16)

See the closing note for details.

rpanderson commented 4 years ago

Ah, thanks @pradyunsg. I incorrectly concluded from this discussion that the resolver wouldn't make the 20.1.1 release at all.

nlhkabu commented 4 years ago

Hi folks, to help the pip team move forward with the dependency resolver, we need your help.

The team needs to better understand the circumstances under which the new resolver fails, so are asking for pip users with complex dependencies to:

  1. Try the new resolver (use version 20.1, run --unstable-feature=resolver)
  2. Break it :P
  3. File an issue

You can find more information and more detailed instructions here

Thanks for your help.

westurner commented 4 years ago

Could there be a command argument like --record-testcase that causes pip to print a generated test case stub to include in new issues?

Maybe also just include that generated test case stub in the exception output (when/of there is an exception) by default?

On Wed, May 20, 2020, 4:23 PM Nicole Harris notifications@github.com wrote:

Hi folks, to help the pip team move forward with the dependency resolver, we need your help.

The team needs to better understand the circumstances under which the new resolver fails, so are asking for pip users with complex dependencies to:

  1. Try the new resolver (use version 20.1, run --unstable-feature=resolver)
  2. Break it :P
  3. File an issue https://github.com/pypa/pip/issues/new?labels%5B%5D=K%3A+UX&labels%5B%5D=K%3A+crash&labels%5B%5D=C%3A+new+resolver&labels%5B%5D=C%3A+dependency+resolution&template=resolver-failure.md

You can find more information and more detailed instructions here http://www.ei8fdb.org/thoughts/2020/05/test-pips-alpha-resolver-and-help-us-document-dependency-conflicts/

Thanks for your help.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pypa/pip/issues/988#issuecomment-631705607, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMNS46YQO7HKS4HGAEJW3RSQ34LANCNFSM4AFZKK2A .

chrisjbillington commented 4 years ago

I notice I can still create broken dependencies over multiple pip commands. The following refuses:

$ pip install --unstable-feature=resolver ./pkg1 ./pkg2
Processing ./pkg1
Processing ./pkg2
ERROR: Could not find a version that satisfies the requirement numpy>=1.18 (from pkg1)
ERROR: Could not find a version that satisfies the requirement numpy<1.18 (from pkg2)
ERROR: No matching distribution found for numpy, numpy

But this works and creates an unsatisfied dependency in the environment:

$ pip install --unstable-feature=resolver ./pkg1 && pip install --unstable-feature=resolver ./pkg2
Processing ./pkg1
Collecting numpy==1.18.4
  Using cached numpy-1.18.4-cp38-cp38-manylinux1_x86_64.whl (20.7 MB)
Building wheels for collected packages: pkg1
  Building wheel for pkg1 (setup.py) ... done
  Created wheel for pkg1: filename=pkg1-1.0-py3-none-any.whl size=976 sha256=caba6b56a8e3ef221b3ab2a54d410c06906fc1d18f8e39c727b1fd3390c2c1bb
  Stored in directory: /tmp/pip-ephem-wheel-cache-05vqpq4b/wheels/e1/0d/17/e4faa4fadb62f9ddfde1b6401c5cb531b8935378a688292c4d
Successfully built pkg1
Installing collected packages: numpy, pkg1
Successfully installed numpy-1.18.4 pkg1-1.0
Processing ./pkg2
Collecting numpy==1.17.5
  Using cached numpy-1.17.5-cp38-cp38-manylinux1_x86_64.whl (20.5 MB)
Building wheels for collected packages: pkg2
  Building wheel for pkg2 (setup.py) ... done
  Created wheel for pkg2: filename=pkg2-1.0-py3-none-any.whl size=975 sha256=a6aa8e994eb7fb961d91a54d90539eb16fd7dcdadb96f8ac21fe07c8319f4e24
  Stored in directory: /tmp/pip-ephem-wheel-cache-zljxre1e/wheels/b7/20/64/f44831ca9644ec03d285cc4ce4a4e1ea170ca7b431d7161409
Successfully built pkg2
ERROR: pkg1 1.0 has requirement numpy>=1.18, but you'll have numpy 1.17.5 which is incompatible.
Installing collected packages: numpy, pkg2
  Attempting uninstall: numpy
    Found existing installation: numpy 1.18.4
    Uninstalling numpy-1.18.4:
      Successfully uninstalled numpy-1.18.4
Successfully installed numpy-1.17.5 pkg2-1.0

It prints an error, but I believe this is the same error that is printed without using the resolver.

Although I haven't found or constructed test packages to test it, this leads me to suspect that the resolver does not deal with the following situation:

It seems like for pip to do this, it would need to keep the equivalent of a requirements.txt on disk, and when you go to install a set of requirements, it would essentially append them to the existing requirements.txt, then run pip install with the resolver on the whole lot, which may upgrade/downgrade/uninstall packages that were not part of the tree of dependencies of the packages specified in the latest install command. When you uninstall a package, pip would remove it from the stored requirements list, if it's there. And at any point you'd be able to get a list of/uninstall 'orphans' - packages that were once installed as a dependency but are no longer required.

Am I misunderstanding, and pip is intending to do this kind of work (basically, what a Linux distro's package manger, or conda, does)? Or is this out of scope for the current resolver?

pfmoore commented 4 years ago

Am I misunderstanding, and pip is intending to do this kind of work (basically, what a Linux distro's package manger, or conda, does)? Or is this out of scope for the current resolver?

This is under active discussion (sorry, I can't recall where the latest notes are - possibly spread all over many communication channels 🙁) but I think the general feeling is that this is out of scope for at least the initial release of the new resolver. The biggest concern is that because the old resolver doesn't enforce this (and there are significant groups of users who simply don't need that level of consistency) we may find that there are a lot of people with environments that are currently inconsistent, and for those people a pip that enforced consistency even with installed packages would be unable to do anything except uninstall stuff. So we'd need to work out how to handle those issues cleanly, which isn't something we want to block the release of the new resolver on.

@pradyunsg @uranusjr @brainwane Feel free to clarify if I've misrepresented our current thinking at all here.

If that level of environment management is something you want, then higher level tools like pipenv do provide this. The question here is how pip handles that situation (or whether it even should), though.

WhyNotHugo commented 4 years ago

The biggest concern is that because the old resolver doesn't enforce this (and there are significant groups of users who simply don't need that level of consistency) we may find that there are a lot of people with environments that are currently inconsistent, and for those people a pip that enforced consistency even with installed packages would be unable to do anything except uninstall stuff.

It sounds to me that pip will need to have two dependency-solving mechanisms (E.g.: the current and the new) for a transition period, while those users slowly make their environments consistent. This might take some time though, as any braking change. 😞

uranusjr commented 4 years ago

One thing I notice in discussions on this topic is an assumption that consistent = good. But this is not always the case. Aside from people who use Python only as a tool and simply do not care about environment consistency (nor the various ways people recommend to make environments separate and consistent), it is also a legistimate use case to delibrately break consistency because some package overly constrains a dependency (#8076), or during development when you are actively working to fix that constraint (#8307). pip will need to provide some way to accomodate them before being able to consider enforcing environment consistency at all times. The problem is even boarder than just transitioning, since there are use cases where transition is impossible.

pfmoore commented 4 years ago

pip will need to provide some way to accomodate them before being able to consider enforcing environment consistency at all times.

... and to be explicit, the behaviour noted by @chrisjbillington is essentially "how pip provides this capability currently (in both the old and new resolvers)". So we can't change that until we have an alternative way to accommodate that need.

WhyNotHugo commented 4 years ago

That is quite right, and for all those cases, merely no dependency resolving is enough. Eg: don't install or touch any dependencies. Ever.

You wouldn't need the current resolver for those either (I mean, they're clearly for usage in scenarios where the user knows what they're doing and is responsible for it).

pfmoore commented 4 years ago

I mean, they're clearly for usage in scenarios where the user knows what they're doing and is responsible for it

Not at all. Consider a new user, who's been doing some work with flask. They did pip install flask to make flask available in their system Python (assume on Windows, so we don't get into debates about distro package managers). That user has finished their work on flask, and now wants to do some data analysis using Jupyter. So they do pip install jupyter.

They have no interest in flask remaining usable. They are too new to know (or care) about virtual environments. They don't understand (or again, care about) dependency management. They just want Jupyter to work, so they can do their job. So they want pip to handle dependency resolution for them, but only to get Jupyter working (which as far as they are concerned, is what they asked pip to do).

Yes, they may in the longer term need to deal with more complex situations, and maybe at that point their previous approach will mean they have tidying up to do. But by that time, they probably have enough experience to handle that. And maybe they'll never even get to that point - using Python and a per-task set of packages may be all they ever need.

In my experience, such non-expert users are far more common than people who "know what they are doing" with packaging. And things currently "just work" for them, so we should be especially careful not to break their usage.

Basically anyone who can even formulate a question or opinion on this matter is already (in my view) an "expert" on the topic 😉

pradyunsg commented 4 years ago

Can we please move this discussion over to #7744? Quoting myself from the past on this issue:

Lots of people are subscribed to this issue, and we want them to notice when we make an announcement here. We do not want to take the unusual step of locking this issue to collaborators, but we also want to try really hard to avoid notification floods. So please be mindful of that if you need to leave a comment on this thread. Thanks!

maxwellmckinnon commented 4 years ago

I'm curious about the context of this.

Doesn't poetry already offer a resolver? And conda?

This is a great quality of life improvement for use cases that don't pin (you should really pin) though given pip's ubiquity.

merwok commented 4 years ago

Poetry is a new tool with some non-standard things (like some version operators).

Conda is more similar to OS-level packaging tools than language-level installers.

Pip is the recommended tool, is installed by ensurepip and venv in the standard library, is possibly the most used tool, and needs a dependency resolver to do its job well. That’s the contect! :slightly_smiling_face:

chrisjbillington commented 4 years ago

you should really pin

One reason pinning dependencies isn't viable is that not every Python environment is for a single application.

In scientific data analysis/data science and similar fields, the user is writing Python code to do many one-off analyses of different kinds, and may need to upgrade some library to get a feature they want to use. It's good if the same environment can survive having bits of it upgraded over time without breaking. If I want to fit a curve to some data and make a plot for a paper, there isn't really any use in pinning dependencies. I make the plot, and then I'm done. But I also don't want to create a new Python environment every time I make a new plot - this seems like overkill.

I also don't want to freeze my dependencies in time - if I revisit this particular analysis in the future I'll want to get it working with the latest libraries rather than install the older libraries - otherwise I won't be able to share my code with others and it won't be inter-operable with other code written with different library versions. "works with the latest version of the libraries" is really the only viable policy for sharing code with other researchers or re-using bits of code over time.

Conda and Linux OS package managers have provided for this kind of workflow so far, but pip is super close. Now that wheels have solved the problem of distributing binaries, we're very close to having one package manager to rule them all.

I'm not savvy with poetry so can't really comment there.

WhyNotHugo commented 4 years ago

I well know that Python environmental aren't for a single application. My system Python has many applications installed that all rely on it and the system Python libraries.

Yes, pinning is done by the developers of these applications.

Also, without pinning, your scripts will break too after some update eventually, since libraries may change their APIs.

remram44 commented 4 years ago

It's good if the same environment can survive

Why do you think that?

"works with the latest version of the libraries" is really the only viable policy

Why do you think that?

Please don't spam the hundreds of people subscribing to this issue for updates on the feature with your opinions presented as fact.

If I want to fit a curve to some data and make a plot for a paper, there isn't really any use in pinning dependencies. I make the plot, and then I'm done.

The "use" is scientific reproducibility. A lot of people care about it, and tools should probably evolve to make it easier, not harder.

chrisjbillington commented 4 years ago

It's good if the same environment can survive

Why do you think that?

The alternative, whilst possible, is inconvenient. Creating a Python environment every time I want to make a plot significantly slows work, that's all.

"works with the latest version of the libraries" is really the only viable policy

Why do you think that?

Let's say two people send their code to me, and I want to use both in the same calculation. If they both pin their dependencies, I'll get a conflict. How can the three of us agree on what dependency versions we will work on supporting? Any set of dependencies is fine so long as we agree, and "the latest" is simply the solution most people instinctively gravitate toward since it means you do not need to explicitly state what dependency versions are required, and you can manage upgrades incrementally instead of all at once. This is mostly just what happens organically when people don't explicitly decide on how they're going to deal with dependency management. The question has not even occurred to many people who are writing and exchanging code productively.

Please don't spam the hundreds of people subscribing to this issue for updates on the feature with your opinions presented as fact.

Well, now that you're asking direct questions, I apologise that by answering them I'm violating this one :)

The "use" is scientific reproducibility. A lot of people care about it, and tools should probably evolve to make it easier, not harder.

The majority of scientific analysis/computing work is exploratory/preliminary and does not see the light of day in a publication. I (strongly) agree with you for the purpose of publishing code, but that's a small subset of code written or run. Most code in scientific circles is fairly ephemeral. If I want to put a plot in the lab logbook I write equations and describe the methods and store a dataset, and then include the plot. The reproducability there primarily lives in the equations and methods as described, not the versions of libraries used. This of course is not perfect, but is the balance struck by most in my experience.

merwok commented 4 years ago

This isn’t a new discussion. Libraries should have open dependencies (like numpy == 1.18.* to get a known good version (but allow bugfix updates), applications need pinned (exact) dependencies. Let’s keep this issue focused please!

WhyNotHugo commented 4 years ago

Let's say two people send their code to me, and I want to use both in the same calculation. If they both pin their dependencies, I'll get a conflict.

Their code will only run with certain versions of their dependencies. If they have incompatible versions:

If they have compatible versions, they it's fine.

Any set of dependencies is fine so long as we agree, and "the latest"

That's exactly what pinning is, click>=3.0,>4.0.

Ignoring dependency constraints only allows to concurrently install libraries that won't work when co-installed.

You can read on the pros and cons of dependency resolution and pinning elsewhere, let's not stretch an off-topic discussion so long, this is going to happen since it's what all other package managers out there do, and it's necessary to avoid creating broken installations.

cclauss commented 4 years ago

Please stop! Let’s keep this thread about actual progress by the team doing the work of adding a real dependency resolver to pip.

There are other venues for opinion pieces so please do not air that laundry here.

brainwane commented 4 years ago

Per #8511 we have now released pip 20.2. This release includes the beta of the next-generation dependency resolver. It is significantly stricter and more consistent when it receives incompatible instructions, and reduces support for certain kinds of constraints files, so some workarounds and workflows may break. Please test it with the --use-feature=2020-resolver flag. Please see our guide on how to test and migrate, and how to report issues. Please report bugs using this survey or by opening a new GitHub issue, not commenting on this one.

The new dependency resolver is off by default because it is not yet ready for everyday use.

We plan to make pip's next quarterly release, 20.3, in October 2020. We are preparing to change the default dependency resolution behavior and make the new resolver the default in pip 20.3.

Please spread the word by pointing to this blog post -- spread the word on Hacker News, Reddit, Twitter, Facebook, Dev.to, Telegram, relevant Stack Overflow answers, your favorite Slacks and Discords, etc. Most of the people this will affect do not keep up with Python-specific developer news. Help them get the heads-up before October, and help us get their bug reports.

thomasf commented 4 years ago

Please consider adding a minimal version selection to pip argument for this resolver, that means only consider the minimal versions specified by each dependency version spec. Go modules uses this and it's the least intrusive version spec system I have seen.

It enables having one pip requirements.txt file without lock files and always results in a reproducible version dependency tree.

If you want to force an upgrade for a sub dependency, just add it to your requirements file or I guess a constraints file. If all package in the resolver has no minimal version set for one specific dependency maybe that could generate a warning because you probably want to add a constraint if that happens.

I guess exceeding a maximum version should still generate an error.

It's a little bit more effort for the project using a feature like this but it also brings repeatable installs without additional lock files and more programs just to manage than lock file.

It would be really nice to move off massively complex tools like pipenv and poetry and be able to skip the concept of a lock file and still get reproducible builds. I'm never interested in the latest possible version of a dependency, I'm always interested on the one I have tested my software with.

After a brief look this is probably best implemented by writing a separate resolver for resolvelib, right?

brainwane commented 4 years ago

Thanks to everyone who tested pip 20.2 and provided bug reports and feedback, or who spread the word! We also made a video you can share.

We are aiming to release pip 20.3 about a week from now, on Wednesday or Thursday, Oct 28 or 29. We are preparing to change the default dependency resolution behavior and make the new resolver the default in pip 20.3.

For more on the rollout and how you can help, see https://github.com/pypa/pip/issues/6536#issuecomment-713038615 -- starting tomorrow, @di is gathering a volunteer first-response team to help reply to confused users.

brainwane commented 4 years ago

As I discussed in a comment elsewhere we decided to delay the release slightly, because of some CI problems cropping up and because of some external factors. pip 20.3b1 is available in case you want to try that out.

In today's team meeting we agreed that the 20.3 release will likely be tomorrow or Friday. You can follow #8936 for more.

We've also substantially improved the "what's changing" user guide so please take a fresh look and circulate it!

And the new resolver is already solving some people's issues, which is great!

brainwane commented 4 years ago

We have now resolved a finicky Mac OS Big Sur support issue and a headache-inducing infinite resolution issue #9011, which were stopping us from releasing. Per https://github.com/pypa/pip/issues/8936#issuecomment-735450632 the pip 20.3 release, in which the new pip resolver will be the default, will very very likely be tomorrow, Monday, 30 November.

pradyunsg commented 4 years ago

pip 20.3 has been released, and it has the new resolver by default! Here's our release announcement on the PSF blog: https://blog.python.org/2020/11/pip-20-3-release-new-resolver.html

pradyunsg commented 4 years ago

That felt goooood. :)

WhyNotHugo commented 3 years ago

Thanks for all the hard work! <3

flying-sheep commented 3 years ago

that means exactly what it says: it’s impossible to resolve this without contradictions. figure out which package could loosen its restrictions and bug them in their issuetracker about it.

also I think this is a good place to lock this conversion, people will continue to come in with stuff like this.