pypa / pip

The Python package installer
https://pip.pypa.io/
MIT License
9.46k stars 3.01k forks source link

pip needs a dependency resolver #988

Closed cboylan closed 3 years ago

cboylan commented 11 years ago

pip's dependency resolution algorithm is not a complete resolver. The current resolution logic has the following characteristics:

NOTE: In cases where the first found dependency is not sufficient, specifying the constraints for the dependency on the top level can be used to make it work.

pip install project "dependency>=1.5,<2.0"

(2019-06-23)

This is being worked on by @pradyunsg, in continuation of his GSoC 2017 project. A substantial amount of code cleanup has been done, and is ongoing, to make it tractable replace the current resolver, in a reasonable manner. This work enabled pip >= 10 to warn when it is going to make an installation that breaks the dependency graph. (The installations are not aborted in such scenarios, for backwards compatibility.)


(2019-11-29)

A status update regarding this is available here.


(2022-12-16)

See the closing note for details.

dstufft commented 11 years ago

I added this to 1.5 because I think it's a pretty big wart that should be sorted out.

qwcode commented 11 years ago

for now, the "pip solution" is to declare in your requirements file (or as an install argument directly) what specifier you want, and it overrides what happens down in the dependency tree.

i.e. in this case, put "foo>=1.0,<=2.0" in your requirements or add it as an install argument.

also, pip compiles everything it's going to install into a set first, then installs. it's not installing everything as it's discovered, and it's discovery order is not depth first. the order of discovery isn't the issue, but rather the lack of doing any resolution as it's compiling the set. Currently, first found wins.

qwcode commented 11 years ago

I keep meaning to add a "conflict resolution" section to the Internal logic section to the docs that covers this. practically speaking, pip's override logic is pretty nice IMO, and is all most people need most of the time.

emonty commented 11 years ago

I believe, based on discussions with @dstufft, that this may be related to the problems people will have if they try to upgrade from old distribute to new setuptools in the same set of software as other things. Or, more specifically, if you depend on a piece of software that depends on distribute with an unbounded high end, and you are currently running on a pre-merge distribute, what will happen is that the dependency discovery will put distribute into the list along with the rest of the things that software A depends on, then it will add distribute's dependency, setuptools, to the list, later. THEN, when it's installing, distribute will be upgraded, which removes the distribute provided setuptools code, then the next thing in the list is installed, which is not setuptools, and which fails because setuptools is not installed yet.

qwcode commented 11 years ago

@emonty the distribute to setuptools upgrade problem you speak of is described in #1064, but also here http://www.pip-installer.org/en/latest/cookbook.html#importerror-no-module-named-setuptools

but this is not related to pip's conflict resolution shortcoming issues described in this issue.

dracos commented 11 years ago

Just had this issue installing a project - on a system that already had python-dateutil 1.4 installed - that depended upon python-dateutil (no specific version) and django-tastypie 0.9.16. Even though django-tastypie has a python-dateutil dependency of >=1.5 (and not 2.0), and the parent has no specific version dependency on python-dateutil, python-dateutil remained at 1.4. This seems pretty fundamentally against what a package installer should be doing in such a circumstance :)

dstufft commented 11 years ago

FWIW I've been experimenting with making a real resolver for pip.

qwcode commented 11 years ago

@dracos there is a "solution" right now (short of a new resolver for pip; although that would be nice). Specifically declare what you want the python-dateutil requirement to be in a requirements file, or as a top-level pip install argument, and that will be honored.

http://www.pip-installer.org/en/latest/cookbook.html#requirements-files http://www.pip-installer.org/en/latest/logic.html#requirement-specifiers

pip install myproject datetutil>=1.5,<2.0

dracos commented 11 years ago

@qwcode Thanks, I know that and will have to do so - but this means anyone working on the project will have to manually check all dependencies. Say someone upgrades django-tastypie and it now requires a later version of python-dateutil, there's no way to detect this besides manual checking of each dependency upgrade/install. Oh well.

qwcode commented 11 years ago

@dracos understood. it's a pain point. but just want others who find this, to at least know, there is some kind of solution.

qwcode commented 11 years ago

@dstufft as you work on a new resolver, keep in mind that top level requirements (i.e. pip install arguments are requirements file entries) will still have to be considered overrides (or dominant). that's a pip feature. Also, to state the obvious, this change will be "backwards incompatible" in the sense that many installations will turn out differently.

dracos commented 11 years ago

I've made a script (that is basically a patch to pip's prepare_files(), but I didn't want to patch pip, I can't control pip everywhere I use it for example) - available at https://github.com/dracos/check-pip-dependencies - that notes multiple dependency requests for the same package and outputs a list of conflicts, that you can then manually resolve using the methods suggested above.

benoitbryon commented 10 years ago

This ticket looks like #174.

ghost commented 10 years ago

The pydata people at continuum analytics have built a packaging toolchain more suited to scientific packaging (That's good, it can get funky). Facing the same issues, they've implemented a dependency resolver using a SAT solver, see here. The paper mentioned there is readable and they've open sourced the wrapper package around the (open as well) SAT solver engine.

Basically, it's all there.

Ivoz commented 10 years ago

@y-p the only problem with that is that it relies on C code, rather than pure python. I presume that it would notionally be unacceptable to expect that everyone installing pip have a C compiler easily available on their system; thus you would have to provide a compiled pip for every system Python hopes to run on. That is a major ask (anyone compiling and distributing for python arm?).

It might be "all there", but it's in an unusable form for pip's general audience.

merwok commented 10 years ago

MIT-licensed pure Python: https://github.com/ActiveState/depgraph (I just know of it, never tested it on the field)

Ivoz commented 10 years ago

0install used to be based on python (now on OCaml), and also wrote a pure python solver, which can be found in their 1.16 release (also in sat.py); afaik it was py23 compatible.

Also some good notes on sat solving on their page, and an awesome email.

minisat paper; GRASP paper

Wilfred commented 10 years ago

pkglib has recently released a pure Python dependency solver, FWIW: https://github.com/ahlmss/pkglib/blob/master/pkglib/pkglib/setuptools/dependency.py#L596

dstufft commented 10 years ago

Thanks, I'll take a look at it when I take a look at the enthought one as well.

pfmoore commented 10 years ago

I've been reading some of the references. One thing I don't see in any of them is how version dependencies are managed - specifically, if package A depends on B (version >=2.0) how is that encoded. Is each version of a package treated as an entirely separate object for resolution purposes? Does anyone have any pointers to references on how this is handled?

dstufft commented 10 years ago

So for a raw SAT solver yes. Basically a SAT solver lets you solve a boolean equation, ideally efficiently.

So if you have foo>=2.0 you'd look and see that foo has 1.0, 2.0, 2.1, 2.2, and 2.3, and you'd translate that into an equation like foo_2_0 or foo_2_1, or foo_2_2 or foo_2_3 (It's actually more complicated than that, because it also needs an equation to ensure that only 1 foo_x_y can be true at a time). The SAT solver will then spit back at you which set of variables should be True/False to satisfy the equation.

The most basic of SAT solver can simply set all variables to False, and then randomly try setting a single variable to True (and recording which it's already tried) to bruteforce the equation. Speed ups occur when you add other things on top of that such as backtracking (instead of starting over from the beginning you undo the last choice and try a different choice), simplifying (Finding subsets of the problem that are the same and collapsing them into themselves), as well as other techniques.

You can also be smarter about how you choose which variables to try to set to True. In the case above we probably don't want to randomly select one because if the only version specifier is foo>=2.0 then we'll end up with a random foo each time, but instead we'd want to try the highest version of foo first, and then go backwards, or even better, try the currently installed version of foo first, and then resort to trying the highest versions.

Basically all the SAT solver papers are just defining better ways at guessing which variables and computing/storing the results to speed up what is essentially a brute force problem.

Ivoz commented 10 years ago

@pfmoore in addition to dstufft's reply, I would guess that reading my previous link http://0install.net/solver.html would tell you how this is done, it's a great article.

pfmoore commented 10 years ago

@Ivoz thanks I'd missed that one

piotr-dobrogost commented 9 years ago

https://github.com/nvie/pip-tools project has pip-compile command dealing with resolution of dependencies. At http://nvie.com/posts/better-package-management/ there's this statement:

We’ve created pip-compile to be smart with respect to resolving complex dependency trees

Might be worth to find out what algorithm is used there.

nils-werner commented 9 years ago

Just for sake of completeness, see the enthought/depsolver project.

JustinCappos commented 9 years ago

tl;dr: Automatic resolution of conflicting dependencies has two solutions which differ slightly in usability and performance. However, it makes sense to look and see if just doing simple dependency resolution is good enough.

Full post:

(I am attempting to summarize much of the discussion on the distutils mailing list with subject: "name of the dependency problem")

The way that I see it, there are three options: simple dependency resolution, backtracking, or SAT solving.

Most Linux package managers do not do backtracking or SAT solving. They simply perform dependency resolution and, in the case of an error, will print a message and halt. I believe they make this design choice due to two factors. First, the occurrence of conflicts is very low, so the added complexity isn't worth the headache. Second, oftentimes the automated handling of conflict may mask the fact that there is actually some underlying problem or issue with the underlying dependency specifications. In other words, when these dependency issues occur it is often because the existing dependency specifications are incomplete or inaccurate. This is a situation that the developers want to be aware of rather than having the issue solved for them in a potentially incorrect manner.

In situations where it is useful to have an automated way to resolve dependencies, backtracking and SAT solving are the two options I am aware of.

Backtracking dependency resolution, basically boils down to saving the state of the dependency resolver whenever you have multiple choices to resolve a dependency. Then if you reach a later point where there is a conflict, you can backtrack to the point where you made a choice and see if another option would resolve the conflict.

The other option is to turn the dependency resolution problem into a boolean equation and use a SAT solver to find a solution. The SAT solver acts like a black-box that finds a matching set of packages, possibly subject to some sub-constraint (such as installing the minimum number of packages).

Given these two candidate solutions, it is worth understanding their pros and cons.

[Speed]: If the number of conflicting constraints is zero or small, then the algorithms will have similar speed. If the number of conflicting constraints is very large, then a SAT solver will almost certainly be faster than backtracking. I do not honestly know where the cut-off would be, but would imagine ~3-5 conflicting delegations in a package would be where backtracking would start to be noticeably slower than a SAT solver. In the worst case, both the SAT solver and backtracking techniques will be exponential in their input because the problem is NP hard.

[Predictability]: One important issue for many users and developers is to understand why a set of dependencies were chosen. Backtracking dependency resolution algorithms will choose the most recent package version and, in cases where there are options about what to install, will prefer resolving a dependency using the order in which packages are listed when the dependency is specified. Thus the developer has a reasonable measure of control over how dependencies are resolved and it is easier for users to understand why dependencies are chosen.

The SAT solver finds a solution that resolves dependencies using in-built heuristics. This means that developers specify dependencies, but the resolver decides the choices to make to resolve them. In some cases, this may find solutions that can be counter-intuitive. (I will use a near-trivial example to try to make this as clear as possible. Real world examples are likely to be much larger and more complex.) For example, suppose that package A has a dependency that can be fulfilled by B versions 1.0 to 2.0 and a dependency on package C version 1.0 (which B 2.0 also depends on). Suppose that two systems install A and that those systems are identical except that one system already had C 1.0 installed and the other installed it as part of dependency resolution for A. It is possible that those systems would end up with different versions of B installed. This is because the SAT solver made different choices about what was optimal based upon the difference in initial state and already installed packages. This can be confusing to users because both systems end up with A and C installed, but have a different version of B.

So, to summarize, there are reasons to potentially choose either type of automatic dependency conflict resolution. Despite (re?)inventing backtracking dependency resolution when building Stork, I really do not have a strong opinion on what should be used. Both algorithms behave in a similar manner when there are no conflicts (the normal case). If there are many conflicts, SAT solvers are faster, but backtracking makes it easier for the developer to control how dependencies are resolved and makes the resolution process easier for the user to understand. There is no wrong answer here and the distinctions between them are not major.

The value of automatic dependency conflict resolution depends on how often it is needed and how hard dependencies are to resolve other ways. Detection could be performed at upload time for packages hosted on Warehouse, to warn the developer of the situations where their users will see conflicts. (This could be a feature regardless of whether a SAT solver, backtracking, or just simple dependency resolution is adopted.)

It is plausible that simply doing simple dependency resolution and raising an error when a conflict occurs (i.e. not choosing either scheme) could also be the most desirable outcome. Perhaps more data and discussion would be useful before deciding on a course of action?

guettli commented 9 years ago

Maybe I am missing something. Tell me if I am wrong.

Before the decision how to do the dependency resolution, the data structures and the transfer of them should be thought about.

How does a dependency resolution algorithm get the needed data?

I look at the warehouse code. Things get much more complicated since the dependency has an attribute "kind".

Related: https://github.com/pypa/warehouse/issues/491#issuecomment-93048647

How can a package depend on "foo installed with the 'build' kind of dependency"?

I think we should focus on data structures, their relations and the transfer of them. Algorithm can be very easy if the basics are done.

qwcode commented 9 years ago

@JustinCappos to be clear, can you explain "simple dependency resolution"

qwcode commented 9 years ago

short of a SAT solver or backtracking, it seems pip could improve itself significantly by simply trying to combine requirements (and failing if they can't be combined) as it parses them (in the current algorithm) vs just accepting the first one.

JustinCappos commented 9 years ago

What I'm calling simple dependency resolution is to just greedily take the first matching thing that fulfills the requirements. Usually, this involves prioritizing the highest versioned package first.

For example, suppose that you have the following packages w/ dependencies:

foo-2.0 -- depends on bar foo-1.0 bar-1.1 bar-1.0

If you ask to install foo, the dependency resolver will decide you likely want foo-2.0 (the latest version). When reading the metadata for foo-2.0, the dependency resolver will see the dependency on bar and resolve this with bar-1.1 (the latest) and then select this for installation it as well.

However, there may be cases where this will fail to install a package when there are conflicting delegations. Suppose we tweak the example above to look like this:

foo-2.0 -- depends on bar and zap-1.0 foo-1.0 bar-1.1 -- depends on zap-2.0 bar-1.0 zap-2.0 zap-1.0

Simple dependency resolution will go through the same steps to choose foo-2.0 and bar-1.1 as before. However, following this it will choose zap-1.0 (to satisfy bar-1.1) and zap-2.0 (to satisfy foo-2.0). The intent to install different versions of the same package is a conflict and, in the case of what I am calling simple dependency resolution, the system will print an error and terminate.

Note that this need not occur when two packages both specify a version number. It is enough that a single package does so. Suppose that dependency resolution is done depth first (so that bar-1.1's dependencies are resolved before the second dependency of foo-2.0). If that dependency of bar-1.1 simply specifies zap, then zap-2.0 will be chosen by the simple dependency resolver. This will also cause some resolvers to declare an error and exit.

On Sat, Apr 18, 2015 at 5:35 PM, Marcus Smith notifications@github.com wrote:

@JustinCappos https://github.com/JustinCappos to be clear, can you explain "simple dependency resolution"

— Reply to this email directly or view it on GitHub https://github.com/pypa/pip/issues/988#issuecomment-94202933.

qwcode commented 9 years ago

to be clear, I've seen 2 uses of "back-tracking" in the discussions.

the first, is what @JustinCappos mentions, which is back-tracking to try alternative versions when a conflict is discovered.

the second, is simply backtracking to re-walk dependencies, as additional constraints are discovered. For example, if you start with "A>=1", but then later discover the constraint "A<2", you may need to re-walk the dependencies of A, if the new constraint resulted in a new version of A.

the second type of "back-tracking" would be used in "simple dependency resolution", where you're still only reporting conflicts and not attempting the first type of "back-tracking" to fix the conflict.

JustinCappos commented 9 years ago

I'm not sure that these situations are completely disjoint. As I see it, the constraints depend on a specific version of a package because they come from that package's metadata. For example, it is B-1.1 that depends on A>=1, not B because B-1.0 may have no dependency on A at all.

On Tue, Apr 21, 2015 at 10:28 PM, Marcus Smith notifications@github.com wrote:

to be clear, I've seen 2 uses of "back-tracking" in the discussions.

the first, is what @JustinCappos https://github.com/JustinCappos mentions, which is back-tracking to try alternative versions when a conflict is discovered.

the second, is simply backtracking to re-walk dependencies, as additional constraints are discovered. For example, if you start with "A>=1", but then later discover the constraint "A<2", you may need to re-walk the dependencies of A, if the new constraint resulted in a new version of A.

the second type of "back-tracking" would be used in "simple dependency resolution", where you're still only reporting conflicts and not attempting the first type of "back-tracking" to fix the conflict.

— Reply to this email directly or view it on GitHub https://github.com/pypa/pip/issues/988#issuecomment-95001005.

qwcode commented 9 years ago

not saying their disjoint, just that "backtracking" (in the 2nd sense) can occur in a "simple dependency resolver". In both cases, it's fundamentally the same process, but just for different purposes.

dstufft commented 9 years ago

https://github.com/CocoaPods/Molinillo is something (written in Ruby) that perhaps we could look at for inspiration as well.

Ivoz commented 9 years ago

@JustinCappos if you read again 0install's summary I believe it demonstrates an easy way to solve the "get confusing/random choices out of a SAT solver" problem.

JustinCappos commented 9 years ago

Thanks for sending this. I'd seen the unoptimized version before from the OPIUM paper with the version number constraints, but hadn't seen this tweak. (An aside: it is interesting to me is that the rationale behind the tweak mostly seems to be to improve the performance, but that it had a side effect where it made the package selection behavior more predictable to the end user.)

To summarize, the modified algorithm works as follows:

Take the package you want to resolve and find the highest version number you can resolve using the SAT solver. Choose that package. Take the first dependency of that package and assume the package you already selected must be installed and find the highest version number you can resolve using the SAT solver. Choose that package. (Continue this process recursively in a depth first manner, solving for all packages.)

What is interesting to me here is that the 0install folks came up with a very similar solution to backtracking dependency resolution (which uses recursion and performs package selection in exactly the same manner). The difference is that the backtracking dependency resolution algorithm determines that packages cannot be resolved by trying candidates itself (and then backtracking), while the 0install solution leverages a SAT solver to find that candidates will not be resolvable (and thus avoiding them). As I understand it, both algorithms will come up with the exact same solution in every case.

So, with that in mind, I suppose it is worth looking at other factors. Performance and maintainability spring to mind. It would not surprise me to learn that the SAT solver optimization is faster at pruning than backtracking dependency resolution. I don't know if pip already requires a SAT solver. Also, I don't know what performance is likely / acceptable for either solution.

I have a student, Bing who is planning to look into the dependency resolution options over the summer. We can look at 0install's implementation as well and provide some numbers to help people to compare.

Does anyone still want to advocate for the OPIUM-style SAT solving? From the 0install post, it seems likely to be both slower and less understandable than what they implemented. So I would think that it would not be likely to be selected for inclusion. (If there is little interest in it, we will prioritize results for the other algorithms instead.)

On Wed, Apr 29, 2015 at 3:04 AM, Matt Iversen notifications@github.com wrote:

@JustinCappos https://github.com/JustinCappos if you read again 0install's summary http://0install.net/solver.html I believe it demonstrates an easy way to solve the "get confusing/random choices out of a SAT solver" problem.

— Reply to this email directly or view it on GitHub https://github.com/pypa/pip/issues/988#issuecomment-97328915.

dstufft commented 9 years ago

Honestly, I don't much care what the actual solution is as long as it will give us a predictable and "ideal" (where ideal == the latest acceptable version that matches all constraints) set of things to install. A constraint we have is that we won't know the full list of dependencies up front, ideally in the future more and more of them will be available via an API provided by Warehouse, but for a long time the vast bulk are going to require downloading the files from PyPI and inspecting them (possibly even executing some Python) to determine what the dependencies are.

This says to me that whatever system we use needs to be able to handle incrementally adding new constraints into it and ideally it should attempt to minimize "wrong" guesses because each attempt is high cost for us. This sounds a lot more like the backtracking solution that @JustinCappos mentioned than a traditional SAT solver, but this is really an area I don't have a ton of foundational knowledge in so I'm personally figuring out the "right" answers as I go along.

guettli commented 9 years ago

Me too. I think the resolver algorithm should be replaceable. For the first step a simple one is enough. Later, the details could be improved. Optimize later.

For me the big question is: where does the resolving happen? At the client or at the server?

And the next question, nobody could answer me up to now: How to handle the "kind" attribute of the dependcy. Related db schema of warehouse: https://github.com/pypa/warehouse/blob/54c67bab2b150acad1267d4d0a4b32426adaa62d/warehouse/legacy/tables.py#L471-L485

I think it would be good to keep it simple: remove the "kind" attribute. If a package "foo" has several kinds,create N packages from one git/hg repo: "foo-dev" "foo-with-fancy-feature" ...

Do RPM/DPKG have dependecies which has a "kind" attribute?

Somehow related: The open street map database schema evolved during the first years a lot. Guess what was done? It was simplified: Less tables, less attributes. Special cases were migrated to a simple and common graph structures.

dstufft commented 9 years ago

Resolving will happen client side, and the KIND is one of:

A lot of these don't make sense or they only make sense in a legacy sense, in reality you're only really dealing with something that says it requires certain things, something that says it provides a certain package, and something that says it obsoletes something.

rbtcollins commented 9 years ago

Ok so, this issue has gotten long and a bit academic. The tl;dr in my opinion is:

Thoughtful comments and kibbitzing on my PR (or an alternative PR that shows a better way) appreciated and solicited. Even just testing it out to see if it eats your dog would be helpful to me. Thanks!

qwcode commented 9 years ago

erroring and stopping isn't very useful.

I think users would prefer an error over what happens now, where conflicts are quietly ignored. With an error, a user could respond and add additional constraints to solve the conflict

just unioning can only handle the most simple of cases

a lot of the cases I've seen in the pip tracker would have been solved with a simple AND of the additional constraints. so I understand, can you explain the cases you care about?

rbtcollins commented 9 years ago

Resolving << erroring << today, agreed. But since I have a working resolver branch, I think debate is largely academic and uninteresting at this point. Unless someone wants to content that erroring << resolving << today.

Unioning fails on transitive situations.

Simple cases like the following work with unioning: A depends on B and C >=1.0. B latest release depends on C >= 1.2. Today you get 1.0 and its all broken.

Unioning fails on situations like this: A depends on B and C >=1.0, <2.0 (e.g. ~= 1.0). B latest release depends on C>=2.0. B's prior release depends on C with some specifier that permits a C <2.0.

glyph commented 9 years ago

@rbtcollins - I am a little confused by the direction of those less-than signs. It reads like you're saying that resolving is worse than erroring is worse than what we have today, but then everything else sounds like you're saying the opposite of that? Would you mind clarifying just a bit?

rbtcollins commented 9 years ago

Read as evolution not less than. ETIRED, not being all that clear :)

qwcode commented 9 years ago

A depends on B and C >=1.0, <2.0 (e.g. ~= 1.0). B latest release depends on C>=2.0. B's prior release depends on C with some specifier that permits a C <2.0

I wouldn't describe this as "Unioning fails". Unioning, as a thing, is not the problem. The "failure" is that a simple single-pass approach can result in a conflicting union. If using backtracking, it requires more passes to find a good union.

JustinCappos commented 9 years ago

I'm not sure that I would say that resolving is better than erroring is better than today.

A different way to look at "erroring" versus "resolving" is to consider what a conflict is. It is likely that a developer was confused when packaging software possibly because different developers did not synchronize their dependency requirements. They certainly didn't test the current combination of software described by the requirements on the repository. As a result, at best, the requirements as specified are untested, if not outright incorrect.

Now, the question is, what should be done about this issue? Many (most?) Linux distributions choose the "error" approach because they want a developer to be aware of the issue, look at it, and fix it. The alternative "resolve" solution tries to have the package manager decide what to do without user or developer interaction.

I'm not sure whether erroring is better or worse than resolving. To try to answer this question a student of mine is starting to look quantitatively at the problem and the steps that may make sense moving forward. We're very interested in checking out your PR and seeing how your solution will compare with others!

Thanks, Justin

On Thu, May 14, 2015 at 3:56 AM, rbtcollins notifications@github.com wrote:

Resolving << erroring << today, agreed. But since I have a working resolver branch, I think debate is largely academic and uninteresting at this point. Unless someone wants to content that erroring << resolving << today.

Unioning fails on transitive situations.

Simple cases like the following work with unioning: A depends on B and C

=1.0. B latest release depends on C >= 1.2. Today you get 1.0 and its all broken.

Unioning fails on situations like this: A depends on B and C >=1.0, <2.0 (e.g. ~= 1.0). B latest release depends on C>=2.0. B's prior release depends on C with some specifier that permits a C <2.0.

— Reply to this email directly or view it on GitHub https://github.com/pypa/pip/issues/988#issuecomment-101960819.

remram44 commented 9 years ago

Interesting resource: https://people.debian.org/~dburrows/model.pdf

Implementing the full backtracking algorithm shouldn't be that much work, this is a solved problem anyway; people implement much more complex algorithms in Python everyday. However I don't know how much metadata pip can get at this time. If getting the dependencies for a version of package B involves downloading the tarball and running the setup.py, backtracking to find a good solution suddenly becomes really slow.

On another point, if we're discussing doing this right, we probably need "conflict" specifiers as different libraries might provide the same interface (i.e. same top-level package name). This doesn't really add difficulty to the resolution algorithm, but packages would need a way to specify this.

Of course the pip/setuptools/distutils situation is not helping :cry:

rbtcollins commented 9 years ago

@JustinCappos w.r.t. distro packaging. apt-using distros (Debian, and Ubuntu etc) don't error - they resolve. I don't know what metric you're using for 'most' - # of distinct packaging systems, # of product OS's using a given system (so weighted by vendors) or # of users (so weighted by market success) - but I'm pretty sure apt has a massive dominance if you weight by users, and is still very large weighted by vendors.

@remram44 - Daniels paper is less relevant to pip than it may seem. Certainly the translation to SAT is in common, but distros aggressively trim the space - there's typically at most 5 versions of any one package being chosen amongst (missing, last-release-of-distro, last-release-of-distro-security, new-release, new-release-of-distro) - and only 3 versions for installation runs being done within a release. Whereas we have dozens or hundreds of versions. Not having conflicts is an advantage in many ways, though I do agree way may need them. We probably want to be a bit conservative about that though. My branch implements full backtracking with pseudo random probing (which IIRC has been shown to support getting out of bad brute-force scans efficiently), and linear best-first probing of a given dep. I'll be super enthusiastic to see folk putting forward different resolution algorithms: the resolver is 100 lines of code: the rest of my branch is refactoring to deal with the reality of having numerous versions which we need to cache on disk while we're choosing (because we can revisit a given version many times - hundreds or thousands easily).

grahamc commented 9 years ago

I think users would prefer an error over what happens now, where conflicts are quietly ignored. With an error, a user could respond and add additional constraints to solve the conflict

+1, I can't count the number of times invalid dependencies have been put together because an invalid situation was ignored.

SMesser commented 9 years ago

In the cases where a "fail-fast" approach is taken, there should be more detailed messages about the conflicting requirements. Such messages should list the conflicting requirements and possibly suggest workarounds to whatever extent possible. Current messages list the package whose installation failed, and only one of the requirements, e.g. "pkg_resources.ContextualVersionConflict: (requests 2.5.1 (/usr/local/lib/python2.7/dist-packages), Requirement.parse('requests<2.5.0,>=2.2.1'), set(['docker-py']))".

Clearly docker-py is involved, but the other, conflicting packages are not shown. As a developer, my reaction to this message is "Okay, I understand requests 2.5.1 is what is currently installed, and I understand that docker-py doesn't like requests 2.5.1, but I'm installing docker-py. Why won't pip install what it needs?"