Closed cboylan closed 4 years ago
Ah, thanks @pradyunsg. I incorrectly concluded from this discussion that the resolver wouldn't make the 20.1.1 release at all.
Hi folks, to help the pip team move forward with the dependency resolver, we need your help.
The team needs to better understand the circumstances under which the new resolver fails, so are asking for pip users with complex dependencies to:
--unstable-feature=resolver
)You can find more information and more detailed instructions here
Thanks for your help.
Could there be a command argument like --record-testcase that causes pip to print a generated test case stub to include in new issues?
Maybe also just include that generated test case stub in the exception output (when/of there is an exception) by default?
On Wed, May 20, 2020, 4:23 PM Nicole Harris notifications@github.com wrote:
Hi folks, to help the pip team move forward with the dependency resolver, we need your help.
The team needs to better understand the circumstances under which the new resolver fails, so are asking for pip users with complex dependencies to:
- Try the new resolver (use version 20.1, run --unstable-feature=resolver)
- Break it :P
- File an issue https://github.com/pypa/pip/issues/new?labels%5B%5D=K%3A+UX&labels%5B%5D=K%3A+crash&labels%5B%5D=C%3A+new+resolver&labels%5B%5D=C%3A+dependency+resolution&template=resolver-failure.md
You can find more information and more detailed instructions here http://www.ei8fdb.org/thoughts/2020/05/test-pips-alpha-resolver-and-help-us-document-dependency-conflicts/
Thanks for your help.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pypa/pip/issues/988#issuecomment-631705607, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMNS46YQO7HKS4HGAEJW3RSQ34LANCNFSM4AFZKK2A .
I notice I can still create broken dependencies over multiple pip commands. The following refuses:
$ pip install --unstable-feature=resolver ./pkg1 ./pkg2
Processing ./pkg1
Processing ./pkg2
ERROR: Could not find a version that satisfies the requirement numpy>=1.18 (from pkg1)
ERROR: Could not find a version that satisfies the requirement numpy<1.18 (from pkg2)
ERROR: No matching distribution found for numpy, numpy
But this works and creates an unsatisfied dependency in the environment:
$ pip install --unstable-feature=resolver ./pkg1 && pip install --unstable-feature=resolver ./pkg2
Processing ./pkg1
Collecting numpy==1.18.4
Using cached numpy-1.18.4-cp38-cp38-manylinux1_x86_64.whl (20.7 MB)
Building wheels for collected packages: pkg1
Building wheel for pkg1 (setup.py) ... done
Created wheel for pkg1: filename=pkg1-1.0-py3-none-any.whl size=976 sha256=caba6b56a8e3ef221b3ab2a54d410c06906fc1d18f8e39c727b1fd3390c2c1bb
Stored in directory: /tmp/pip-ephem-wheel-cache-05vqpq4b/wheels/e1/0d/17/e4faa4fadb62f9ddfde1b6401c5cb531b8935378a688292c4d
Successfully built pkg1
Installing collected packages: numpy, pkg1
Successfully installed numpy-1.18.4 pkg1-1.0
Processing ./pkg2
Collecting numpy==1.17.5
Using cached numpy-1.17.5-cp38-cp38-manylinux1_x86_64.whl (20.5 MB)
Building wheels for collected packages: pkg2
Building wheel for pkg2 (setup.py) ... done
Created wheel for pkg2: filename=pkg2-1.0-py3-none-any.whl size=975 sha256=a6aa8e994eb7fb961d91a54d90539eb16fd7dcdadb96f8ac21fe07c8319f4e24
Stored in directory: /tmp/pip-ephem-wheel-cache-zljxre1e/wheels/b7/20/64/f44831ca9644ec03d285cc4ce4a4e1ea170ca7b431d7161409
Successfully built pkg2
ERROR: pkg1 1.0 has requirement numpy>=1.18, but you'll have numpy 1.17.5 which is incompatible.
Installing collected packages: numpy, pkg2
Attempting uninstall: numpy
Found existing installation: numpy 1.18.4
Uninstalling numpy-1.18.4:
Successfully uninstalled numpy-1.18.4
Successfully installed numpy-1.17.5 pkg2-1.0
It prints an error, but I believe this is the same error that is printed without using the resolver.
Although I haven't found or constructed test packages to test it, this leads me to suspect that the resolver does not deal with the following situation:
spam==1.0
which requires ham
, which requires cheese < 2.0
.libfoo
, which requires cheese >= 2.0
ham
, but in the meantime a new ham
with support for cheese >=2.0
has become available.ham
to the latest version.ham
I wanted to install at any point. It would only do it because it is compatible with the explicitly given requirements given to it earlier.It seems like for pip to do this, it would need to keep the equivalent of a requirements.txt on disk, and when you go to install a set of requirements, it would essentially append them to the existing requirements.txt, then run pip
install with the resolver on the whole lot, which may upgrade/downgrade/uninstall packages that were not part of the tree of dependencies of the packages specified in the latest install
command. When you uninstall a package, pip would remove it from the stored requirements list, if it's there. And at any point you'd be able to get a list of/uninstall 'orphans' - packages that were once installed as a dependency but are no longer required.
Am I misunderstanding, and pip is intending to do this kind of work (basically, what a Linux distro's package manger, or conda, does)? Or is this out of scope for the current resolver?
Am I misunderstanding, and pip is intending to do this kind of work (basically, what a Linux distro's package manger, or conda, does)? Or is this out of scope for the current resolver?
This is under active discussion (sorry, I can't recall where the latest notes are - possibly spread all over many communication channels 🙁) but I think the general feeling is that this is out of scope for at least the initial release of the new resolver. The biggest concern is that because the old resolver doesn't enforce this (and there are significant groups of users who simply don't need that level of consistency) we may find that there are a lot of people with environments that are currently inconsistent, and for those people a pip that enforced consistency even with installed packages would be unable to do anything except uninstall stuff. So we'd need to work out how to handle those issues cleanly, which isn't something we want to block the release of the new resolver on.
@pradyunsg @uranusjr @brainwane Feel free to clarify if I've misrepresented our current thinking at all here.
If that level of environment management is something you want, then higher level tools like pipenv do provide this. The question here is how pip handles that situation (or whether it even should), though.
The biggest concern is that because the old resolver doesn't enforce this (and there are significant groups of users who simply don't need that level of consistency) we may find that there are a lot of people with environments that are currently inconsistent, and for those people a pip that enforced consistency even with installed packages would be unable to do anything except uninstall stuff.
It sounds to me that pip
will need to have two dependency-solving mechanisms (E.g.: the current and the new) for a transition period, while those users slowly make their environments consistent. This might take some time though, as any braking change. 😞
One thing I notice in discussions on this topic is an assumption that consistent = good. But this is not always the case. Aside from people who use Python only as a tool and simply do not care about environment consistency (nor the various ways people recommend to make environments separate and consistent), it is also a legistimate use case to delibrately break consistency because some package overly constrains a dependency (#8076), or during development when you are actively working to fix that constraint (#8307). pip will need to provide some way to accomodate them before being able to consider enforcing environment consistency at all times. The problem is even boarder than just transitioning, since there are use cases where transition is impossible.
pip will need to provide some way to accomodate them before being able to consider enforcing environment consistency at all times.
... and to be explicit, the behaviour noted by @chrisjbillington is essentially "how pip provides this capability currently (in both the old and new resolvers)". So we can't change that until we have an alternative way to accommodate that need.
That is quite right, and for all those cases, merely no dependency resolving is enough. Eg: don't install or touch any dependencies. Ever.
You wouldn't need the current resolver for those either (I mean, they're clearly for usage in scenarios where the user knows what they're doing and is responsible for it).
I mean, they're clearly for usage in scenarios where the user knows what they're doing and is responsible for it
Not at all. Consider a new user, who's been doing some work with flask. They did pip install flask
to make flask available in their system Python (assume on Windows, so we don't get into debates about distro package managers). That user has finished their work on flask, and now wants to do some data analysis using Jupyter. So they do pip install jupyter
.
They have no interest in flask remaining usable. They are too new to know (or care) about virtual environments. They don't understand (or again, care about) dependency management. They just want Jupyter to work, so they can do their job. So they want pip to handle dependency resolution for them, but only to get Jupyter working (which as far as they are concerned, is what they asked pip to do).
Yes, they may in the longer term need to deal with more complex situations, and maybe at that point their previous approach will mean they have tidying up to do. But by that time, they probably have enough experience to handle that. And maybe they'll never even get to that point - using Python and a per-task set of packages may be all they ever need.
In my experience, such non-expert users are far more common than people who "know what they are doing" with packaging. And things currently "just work" for them, so we should be especially careful not to break their usage.
Basically anyone who can even formulate a question or opinion on this matter is already (in my view) an "expert" on the topic 😉
Can we please move this discussion over to #7744? Quoting myself from the past on this issue:
Lots of people are subscribed to this issue, and we want them to notice when we make an announcement here. We do not want to take the unusual step of locking this issue to collaborators, but we also want to try really hard to avoid notification floods. So please be mindful of that if you need to leave a comment on this thread. Thanks!
I'm curious about the context of this.
Doesn't poetry already offer a resolver? And conda?
This is a great quality of life improvement for use cases that don't pin (you should really pin) though given pip's ubiquity.
Poetry is a new tool with some non-standard things (like some version operators).
Conda is more similar to OS-level packaging tools than language-level installers.
Pip is the recommended tool, is installed by ensurepip and venv in the standard library, is possibly the most used tool, and needs a dependency resolver to do its job well. That’s the contect! :slightly_smiling_face:
you should really pin
One reason pinning dependencies isn't viable is that not every Python environment is for a single application.
In scientific data analysis/data science and similar fields, the user is writing Python code to do many one-off analyses of different kinds, and may need to upgrade some library to get a feature they want to use. It's good if the same environment can survive having bits of it upgraded over time without breaking. If I want to fit a curve to some data and make a plot for a paper, there isn't really any use in pinning dependencies. I make the plot, and then I'm done. But I also don't want to create a new Python environment every time I make a new plot - this seems like overkill.
I also don't want to freeze my dependencies in time - if I revisit this particular analysis in the future I'll want to get it working with the latest libraries rather than install the older libraries - otherwise I won't be able to share my code with others and it won't be inter-operable with other code written with different library versions. "works with the latest version of the libraries" is really the only viable policy for sharing code with other researchers or re-using bits of code over time.
Conda and Linux OS package managers have provided for this kind of workflow so far, but pip is super close. Now that wheels have solved the problem of distributing binaries, we're very close to having one package manager to rule them all.
I'm not savvy with poetry so can't really comment there.
I well know that Python environmental aren't for a single application. My system Python has many applications installed that all rely on it and the system Python libraries.
Yes, pinning is done by the developers of these applications.
Also, without pinning, your scripts will break too after some update eventually, since libraries may change their APIs.
It's good if the same environment can survive
Why do you think that?
"works with the latest version of the libraries" is really the only viable policy
Why do you think that?
Please don't spam the hundreds of people subscribing to this issue for updates on the feature with your opinions presented as fact.
If I want to fit a curve to some data and make a plot for a paper, there isn't really any use in pinning dependencies. I make the plot, and then I'm done.
The "use" is scientific reproducibility. A lot of people care about it, and tools should probably evolve to make it easier, not harder.
It's good if the same environment can survive
Why do you think that?
The alternative, whilst possible, is inconvenient. Creating a Python environment every time I want to make a plot significantly slows work, that's all.
"works with the latest version of the libraries" is really the only viable policy
Why do you think that?
Let's say two people send their code to me, and I want to use both in the same calculation. If they both pin their dependencies, I'll get a conflict. How can the three of us agree on what dependency versions we will work on supporting? Any set of dependencies is fine so long as we agree, and "the latest" is simply the solution most people instinctively gravitate toward since it means you do not need to explicitly state what dependency versions are required, and you can manage upgrades incrementally instead of all at once. This is mostly just what happens organically when people don't explicitly decide on how they're going to deal with dependency management. The question has not even occurred to many people who are writing and exchanging code productively.
Please don't spam the hundreds of people subscribing to this issue for updates on the feature with your opinions presented as fact.
Well, now that you're asking direct questions, I apologise that by answering them I'm violating this one :)
The "use" is scientific reproducibility. A lot of people care about it, and tools should probably evolve to make it easier, not harder.
The majority of scientific analysis/computing work is exploratory/preliminary and does not see the light of day in a publication. I (strongly) agree with you for the purpose of publishing code, but that's a small subset of code written or run. Most code in scientific circles is fairly ephemeral. If I want to put a plot in the lab logbook I write equations and describe the methods and store a dataset, and then include the plot. The reproducability there primarily lives in the equations and methods as described, not the versions of libraries used. This of course is not perfect, but is the balance struck by most in my experience.
This isn’t a new discussion. Libraries should have open dependencies (like numpy == 1.18.*
to get a known good version (but allow bugfix updates), applications need pinned (exact) dependencies. Let’s keep this issue focused please!
Let's say two people send their code to me, and I want to use both in the same calculation. If they both pin their dependencies, I'll get a conflict.
Their code will only run with certain versions of their dependencies. If they have incompatible versions:
If they have compatible versions, they it's fine.
Any set of dependencies is fine so long as we agree, and "the latest"
That's exactly what pinning is, click>=3.0,>4.0
.
Ignoring dependency constraints only allows to concurrently install libraries that won't work when co-installed.
You can read on the pros and cons of dependency resolution and pinning elsewhere, let's not stretch an off-topic discussion so long, this is going to happen since it's what all other package managers out there do, and it's necessary to avoid creating broken installations.
Please stop! Let’s keep this thread about actual progress by the team doing the work of adding a real dependency resolver to pip.
There are other venues for opinion pieces so please do not air that laundry here.
Per #8511 we have now released pip 20.2. This release includes the beta of the next-generation dependency resolver. It is significantly stricter and more consistent when it receives incompatible instructions, and reduces support for certain kinds of constraints files, so some workarounds and workflows may break. Please test it with the --use-feature=2020-resolver
flag. Please see our guide on how to test and migrate, and how to report issues. Please report bugs using this survey or by opening a new GitHub issue, not commenting on this one.
The new dependency resolver is off by default because it is not yet ready for everyday use.
We plan to make pip's next quarterly release, 20.3, in October 2020. We are preparing to change the default dependency resolution behavior and make the new resolver the default in pip 20.3.
Please spread the word by pointing to this blog post -- spread the word on Hacker News, Reddit, Twitter, Facebook, Dev.to, Telegram, relevant Stack Overflow answers, your favorite Slacks and Discords, etc. Most of the people this will affect do not keep up with Python-specific developer news. Help them get the heads-up before October, and help us get their bug reports.
Please consider adding a minimal version selection to pip argument for this resolver, that means only consider the minimal versions specified by each dependency version spec. Go modules uses this and it's the least intrusive version spec system I have seen.
It enables having one pip requirements.txt file without lock files and always results in a reproducible version dependency tree.
If you want to force an upgrade for a sub dependency, just add it to your requirements file or I guess a constraints file. If all package in the resolver has no minimal version set for one specific dependency maybe that could generate a warning because you probably want to add a constraint if that happens.
I guess exceeding a maximum version should still generate an error.
It's a little bit more effort for the project using a feature like this but it also brings repeatable installs without additional lock files and more programs just to manage than lock file.
It would be really nice to move off massively complex tools like pipenv and poetry and be able to skip the concept of a lock file and still get reproducible builds. I'm never interested in the latest possible version of a dependency, I'm always interested on the one I have tested my software with.
After a brief look this is probably best implemented by writing a separate resolver for resolvelib, right?
Thanks to everyone who tested pip 20.2 and provided bug reports and feedback, or who spread the word! We also made a video you can share.
We are aiming to release pip 20.3 about a week from now, on Wednesday or Thursday, Oct 28 or 29. We are preparing to change the default dependency resolution behavior and make the new resolver the default in pip 20.3.
For more on the rollout and how you can help, see https://github.com/pypa/pip/issues/6536#issuecomment-713038615 -- starting tomorrow, @di is gathering a volunteer first-response team to help reply to confused users.
As I discussed in a comment elsewhere we decided to delay the release slightly, because of some CI problems cropping up and because of some external factors. pip 20.3b1 is available in case you want to try that out.
In today's team meeting we agreed that the 20.3 release will likely be tomorrow or Friday. You can follow #8936 for more.
We've also substantially improved the "what's changing" user guide so please take a fresh look and circulate it!
And the new resolver is already solving some people's issues, which is great!
We have now resolved a finicky Mac OS Big Sur support issue and a headache-inducing infinite resolution issue #9011, which were stopping us from releasing. Per https://github.com/pypa/pip/issues/8936#issuecomment-735450632 the pip 20.3 release, in which the new pip resolver will be the default, will very very likely be tomorrow, Monday, 30 November.
pip 20.3 has been released, and it has the new resolver by default! Here's our release announcement on the PSF blog: https://blog.python.org/2020/11/pip-20-3-release-new-resolver.html
That felt goooood. :)
Thanks for all the hard work! <3
that means exactly what it says: it’s impossible to resolve this without contradictions. figure out which package could loosen its restrictions and bug them in their issuetracker about it.
also I think this is a good place to lock this conversion, people will continue to come in with stuff like this.
pip's dependency resolution algorithm is not a complete resolver. The current resolution logic has the following characteristics:
NOTE: In cases where the first found dependency is not sufficient, specifying the constraints for the dependency on the top level can be used to make it work.
(2019-06-23)
This is being worked on by @pradyunsg, in continuation of his GSoC 2017 project. A substantial amount of code cleanup has been done, and is ongoing, to make it tractable replace the current resolver, in a reasonable manner. This work enabled pip >= 10 to warn when it is going to make an installation that breaks the dependency graph. (The installations are not aborted in such scenarios, for backwards compatibility.)
(2019-11-29)
A status update regarding this is available here.
(2022-12-16)
See the closing note for details.