Warn users about dependency conflicts when updating other packages

uranusjr commented 4 years ago

This is a mental note on a topic I realise needing a discussion while working on another issue.

Say we have package foo and bar with the following dependencies:

foo 1.0.0
    six<1.12

foo 2.0.0
    six>=1.12

bar 1.0.0
    six<1.12

bar 2.0.0
    six>=1.12

Given an environment with the followings installed:

foo 1.0.0
bar 1.0.0
six 1.11.0

and the user runs pip install --upgrade foo. What should we do? If we upgrade foo to 2.0.0, six needs to be upgraded as well (as an intrinsic requirement), but now it would conflict with bar. I can think of three possibile approaches:

Upgrade foo and six, and print an error/warning telling the user bar now has unsatisfied requirements.
Upgrade bar automatically to 2.0.0.
Telling the user everything is up-to-date, since the installed foo 1.0.0 is the latest version without conflicts.
Error out without modifying the environment, saying the upgrade would introduce incompatibilities.

Approach 1 is the simplest, but might be too difficult for the user to notice (especially on CI). This is probably not a good idea if we can avoid it.

Approach 2 looks like a good idea at first glance, but IMO may be confusing to the user. The dependency graph would be much less complex in more than one way in practice, and it would be difficult for the user to notice, or understand why a seemingly unrelated package got upgraded.

Approach 3 is “correct” in thoery, but is as unuseful to the user as pip’s famous “No matching distributions found for” error. There is clearly a newer version to upgrade to from the user’s perspective. Why is pip not finding it? Open GitHub and file a bug report.

Approach 4 is the most reasonable to me. In the above example, pip would emit something like six>=1.12 (required by foo) would cause incompatibility in bar (requires six<1.12). The downside is pip would need to do more work to interpret the resolution result (this does not fit into the resolution process IMO).

pfmoore commented 4 years ago

My initial thoughts (assuming we're talking about the new resolver);

Approach 1 is just wrong. The new resolver should (IMO) never install an incompatible set of packages. Printing a warning is just saying "I messed up". Printing an error and not reverting is unfriendly (it's what the current resolver does, so it's clearly wrong :wink:).

Approach 2 I agree is reasonable at first, but awkward in practice. I'd think of this as saying that we shouldn't go "up" the dependency tree (from dependency to parent) when upgrading. By saying "upgrade foo", the user is giving implicit permission to upgrade foo's dependencies, but it's not obvious that they have agreed to upgrade bar, even if that's needed to complete the upgrade of foo.

Approach 3 I agree is not the right thing to do. We're not doing what the user asked. At a minimum we should say "cannot upgrade without forcing upgrade of unrelated package bar".

Approach 4 makes sense, but it's more a case of giving up cleanly than being the right approach.

IMO, the best approach would be a combination of 2 and 3. We find that the resolution would require upgrading a package that isn't a dependency of anything the user requested on the command line, and say "upgrading would require also upgrading the following unrelated packages: bar - continue?" and then proceed if the user agrees, otherwise stop and say the system is as up to date as we can make it.

pradyunsg commented 4 years ago

Note that we don't have any wait-for-user-input behaviours, outside of anything we're not overriding in network/VCS code.

Introducing something like this would be slightly disruptive for certain users.

uranusjr commented 4 years ago

We’d also need to handle non-interactive situations if we implement a prompt. In that case IMO it’s better to error loudly (because that’s the main case for CI). So… a combination of 2, 3, and 4?

I wonder if we can use --upgrade-strategy for this. If that’s the case—Approach 2 sounds like always to me. What should only-if-needed do, and which approach should be default?

pradyunsg commented 4 years ago

Honestly, I suggest we treat 4 as the "fallback" option for us here -- unless we have reasonable precedence and consensus on some other option, let's go with that.

Can someone check what other package managers with "proper" resolvers do?

uranusjr commented 4 years ago

Most package managers don’t have this problem because they keep a “manifest” describing the user’s original intention (e.g. package.json, Cargo.toml, Gemfile). From my understanding they’d go approach 2 if the user allows b to be upgraded, otherwise approach 3.

xavfernandez commented 4 years ago

Most package managers don’t have this problem because they keep a “manifest” describing the user’s original intention (e.g. package.json, Cargo.toml, Gemfile).

For Python, we have https://www.python.org/dev/peps/pep-0376/#requested that could serve a similar purpose.

uranusjr commented 4 years ago

For Python, we have https://www.python.org/dev/peps/pep-0376/#requested that could serve a similar purpose.

TIL! Yes that sounds like a good idea. pip does not already implement it, does it? (I didn’t read the implementation, but couldn’t find this file in any of my existing environments.)

pradyunsg commented 4 years ago

There should be a tracking issue for this. pip does not implement this.

pfmoore commented 4 years ago

PEP 376 is something of an odd thing - it was the basis of the aborted "distutils2" rewrite, but it had a lot of good ideas, particularly in PEP 376. But even though the PEP as a whole was accepted, it was never all implemented. None of the pkgutil functions were ever implemented, for example.

IMO, it would be a good idea to go through PEP 376 and revise it to document what is implemented, and decide what of the rest should be (and retain that) or should be dropped (and do so). Maybe we could do this incrementally, I don't know? PEP 376 needs a certain critical mass of stuff that is actually going to be implemented to maintain credibility 🙁

uranusjr commented 4 years ago

And I’m not sure REQUESTED is the best place to store the information TBH (although storing it is definitely better than not). Let’s have more chat on this… (and other things in PEP 376 as a whole)

pfmoore commented 4 years ago

https://discuss.python.org/t/does-pep-376-need-a-review/3269

pfmoore commented 4 years ago

@pradyunsg @uranusjr One thing I just thought of. With the original example, of foo and bar, when the user types pip install foo, how will the resolver even know that bar exists?

The root set of requirements (from the command line) is foo. The dependencies that get discovered will include six, but why will the resolver ever add bar to the dependency graph? It will only do that if we seed the graph with "everything the user currently has installed" - and that's potentially a large set and could slow down resolve times substantially.

I think we need to consider the option that even with the new resolver, pip will only solve for requested requirements and their dependencies. This would mean that pip could produce a broken environment (in effect option 1 above, with the message coming from a follow-up automatic pip check, just as it does now).

I can see ways that we could let the resolver "see" what's on the index for requirements specified by the user and their dependencies, but for other installed packages just see the installed version. But that could be tricky to implement. Actually, it could be tricky to get the resolver to even ask for available candidates for bar above.

I guess what I'm saying is that if we assume a resolver that follows the dependency graph from the root set provided by the user, but only going in the direction "parent -> dependency" (because that's the only direction the metadata supports), it's not actually clear to me if we even can implement behaviours 2-4...

The point @uranusjr made in our meeting is extremely relevant here - pip does not have any sort of information about the global question "what does the user want to have in this environment" (the sort of information that other tools maintain via files like Pipfile or cargo.toml), other than in the form "this is a list of exact project/version pairs that are currently installed". That's a fundamental limitation of pip's design as a low-level tool, and we should be very careful about trying to design features that assume otherwise.

With that in mind, behaviour (1) becomes much more reasonable as simply being "upgrade what the user asked for".

uranusjr commented 4 years ago

I do have a feeling this is less a problem in practice since people having this kind of problems are likely already using a higher-level tool (e.g. Pipenv, Poetry, pip-tools).

The resolver does not really need to know bar exists in the simplistic example, only that there’s a six<1.12 contributed by someone (i.e. find_matches() should only return candidates that don’t cause conflicts with any currently installed packages). Conflcits are still possible if a new requirement is introduced deeper down the tree, and pip would need to choose whether to go behaviour 1, or error out after the resolution finishes (by checking ther result against installed packages again). This would be a strong indication the user should switch to a more capable setup, and I feel approach 1 would be acceptable (maybe with a suggestion message) at this point.

pradyunsg commented 4 years ago

To be abundantly clear, I think this is going to be one of the issues that can cause "oh the new resolver actually breaks my setup, leaving a sour taste in my mouth" situation for our users, during the new resolver's rollout.

I think the resolver should consider the existing packages in the environment (otherwise, there's very little benefit to this whole exercise for many users), and we should have our default "strategy" be:

if not installed:
- install whatever works "best" -> best == newest compatible version.
if already installed:
- if not directly depended on by the package:
  - "don't touch it": error out when it's incompatible with all the potential choices.
  - Q: do we care about telling the user "hey, pip picked an older version than it would've if you didn't have X in your environment", in cases where stuff worked out?
- if directly depended upon by the package:
  - prefer existing installed version when possible, and allowed to change the version.
  - TODO: do we allow downgrades and upgrades? do we want to treat those differently?

I imagine this would be the equivalent of the "only-if-needed" strategy that we have. For "eager", we can skip the "if directly depended upon" to skip the "prefer existing installed version" behavior.

uranusjr commented 4 years ago

if not installed:

install whatever works "best" -> best == newest compatible version.

I believe this would “simply work”[*] in practice. Assuming the user already has a “working” environment, a package being not installed means it’s not depended by any existing distributions. The resolver can choose any version it needs to depending on the newly requested packages.

[]: A working environment does not necessarily mean the environment does not contain any conflicts or broken dependencies, but there’s nothing wrong as far as the user concerns*. pip does not need to actively fix the environment unless the user specifies so.

if already installed:

if not directly depended on by the package:

"don't touch it": error out when it's incompatible with all the potential choices.

Q: do we care about telling the user "hey, pip picked an older version than it would've if you didn't have X in your environment", in cases where stuff worked out?

if directly depended upon by the package:

prefer existing installed version when possible, and allowed to change the version.

TODO: do we allow downgrades and upgrades? do we want to treat those differently?

With REQUESTED out of the consideration for the short term, I feel pip is not in the position to switch behaviour based on reverse dependency information. It is still very often wrong to automatically upgrade a package, even if it is depended by another. Django packages, for example, usually depend on django, but I’d be very annoyed if installing the latest django-debug-toolbar automatically upgrades my Django installation from 2.2 to 3.0 because djangorestframework also depends on django. The better behaviour is to never touch an already-installed package, and always error out if that does not work. It would also be extremely useful to tell the user to either upgrade or uninstall the package, and what other packages depend on the conflicting package; otherwise the user would be hard-pressed to decide what action is best.

It would be way too restrictive to always error out, of course. My feeling is this should be where --upgrade comes into play. Upgrading (or downgrading; any version-changing operations should be treated the same in this context IMO) can happen if the user supplies this flag, and --upgrade-strategy affects whether to prefer upgrading the world, or the smallest possible set.

This leaves only one question, whether we should error out, or simply warn if an incompatibility would occur (for a package in the environment, but not a dependency of the given packages in the install --upgrade call) after resolution. The checks would be exactly the same (pip already implements it at install time); the only difference would be whether we install or not. I feel the current behaviour is actually useful sometimes, but personally don’t really care either way.

pfmoore commented 4 years ago

My feeling is that this is something where any decision we make will have to be reviewed in the light of actual user feedback. So (a) we shouldn't get too concerned with working out the "perfect" answer (I think @uranusjr's summary above seems about right), and (b) it would be good if the UX work could get user feedback on this question. (Can someone ping Bernard on this issue? I can't remember or find his github username 🙁)

uranusjr commented 4 years ago

@ei8fdb ^

pradyunsg commented 4 years ago

Given the sheer number of users, it's possible that we don't surface folks who do care about the failure case here, as part of our UX study.

I personally feel that we should error out vs warn in most scenarios w/ the new resolver, since (1) it's stricter, and (2) it'll help eagerly surface any users who'd care about the stricter behavior here, and we can work with them to understand why the new/stricter behavior doesn't work for them.

Plus, we've said:

if you ask pip to install two packages with incompatible requirements, it will refuse (rather than installing a broken combination, like it does now).

If you do separate "pip install" runs and pip won't refuse to install a broken combination... That's gonna be tricky to explain. :P

uranusjr commented 4 years ago

Well you see we said “ask pip to install two packages” but here you’re only asking it to install one so it does not know about the other you installed previously…

Yeah I get what you mean. I don’t mind always error out, but we’ll need to offer some escape mechanism so a user can say “hey I actually don’t care about these breakages” without needing to uninstall-try again-uninstall stuff until pip feels satisfied. We can probably put this on hold, and get a better grasp to the problem after we actually ship the resolver.

pfmoore commented 4 years ago

without needing to uninstall-try again-uninstall stuff until pip feels satisfied

... which is particularly hard when there's dependencies, as pip doesn't have an "uninstall X and its dependencies" command.

We can probably put this on hold, and get a better grasp to the problem after we actually ship the resolver.

Agreed. It's not entirely clear to me yet what information we'll even have from the resolver to report on in this situation.

pfmoore commented 4 years ago

Not directly related to this question, but I wanted to put down my thoughts about upgrade strategies somewhere, and this seems like the best place. I'm ignoring the possibility for now that we might add new upgrade strategies, or modify the ones that exist.

Basically, the difference between "eager" and "only-if-needed" is all about how dependencies are handled. Base requirements specified on the command line are always upgraded ("because that's what the user asked for"). But technically, we don't upgrade requirements, we upgrade packages. So I take it that --upgrade means to take every package name given on the command line and tell the resolver "don't prefer installed versions". And "eager" says "extend that to dependencies as well".

So far, that's a relatively simple rule, and easy enough to apply to complex cases to decide what happens.

There is one additional rule which I think we should apply, which is (hopefully!) equally simple to reason about.

We never downgrade. Currently I don't think pip downgrades (short of --always-reinstall which is a whole different question that I prefer to ignore here). I think we should explicitly make that a rule of the new resolver too. [This can be implemented by just discarding candidates older than the installed version in find_matches]

The other complexities in the original scenario posted here are because the (new) resolver never even knows that bar exists, if it follows its "obvious" discovery route (root requirements -> their dependencies). That case needs further thinking, and I need to stop at this point, so I'll come back to it.

To be clear, I'm trying to come at all of this from a basis of "what simple, easy to reason about, rules can we apply, and what are their consequences?" This is as opposed to the way I've been thinking about this issue until now, which is more "what is the correct thing to do here, and how do we define the logic that gives us that correct answer?" I suspect the former approach will give more understandable behaviour for users, at the cost of maybe having some arguable behaviours - but "I can see why it does this even though I'd prefer it to do something else" seems like it might be a good compromise to me.

pradyunsg commented 4 years ago

That sounds good to me!

As long as we're satisfying "it will no longer install a combination of packages that is mutually inconsistent" promise for the resolver's behaviors, for the common cases, I'm a not-cranky kiddo.

pfmoore commented 4 years ago

Currently I don't think pip downgrades

... and I was wrong.

If we have click 7.1.1 installed and do "pip install mypackage" where mypackage depends on click 5.0 explicitly, pip will download click to 5.0.

I understand why, but it's certainly not what I initially expected.

pradyunsg commented 4 years ago

I understand why, but it's certainly not what I initially expected.

This sentence makes me happy. I'm not the only one who feels like this about things the older resolver does. :)

Also, ah well.

uranusjr commented 4 years ago

(Read pip source) “Oh that’s why. But WHY.”

pfmoore commented 4 years ago

It actually ties back to my confusion over --upgrade (not being "give pip permission to upgrade"). This feels like there should also be an --allow-downgrade option (but only if --upgrade was changed to actually be --allow-upgrade, otherwise it's just another inconsistency 🙁)

Sigh.

dstufft commented 4 years ago

For whatever it's worth, I kind of hate the --upgrade option, and I'm not really entirely sure why it exists or that it should exist. The default behavior of "install foo if it isn't already installed" has always been weird to me. Maybe it's a radical idea and not entirely related to this issue, but maybe it wouldn't be the worst thing to question whether that flag needs to exist at all or not.

As far as the actual topic at hand, currently pip only considers what you passed in this current invocation as the list of things you're trying to install. However conceptually it isn't super crazy to consider it more like pip install foo is more like adding foo to the list of things installed into this environment, and then running an, in concept, distinct step that passes the entire list of things to be installed into the resolver. This would basically mean that in concept, pip install foo bar could be thought of as pip add foo bar && pip resolve or even pip add foo && pip add bar && pip resolve.

This would be a bit of a paradigm shift I think, but I think it actually makes sense? At least it does to me. I feel like any solution that i doesn't involve looking at the entire environment as a whole is obviously wrong?

pfmoore commented 4 years ago

Without going all-in on the model @dstufft proposes, I think it would be possible to work as follows:

For any installed package foo X.Y that is not in the root set of requirements passed into the resolver, treat all of the dependencies of foo as constraints in the overall solve (so they must be satisfied). Also, unless the upgrade strategy is "eager", we could add "foo==X.Y" to the constraints, I think.

(That might actually make some solves quicker, by restricting the set of versions checked more...)

The downside here is that if the user already has a broken environment (however that may have happened) pip may end up in a situation where it can't fix things, because it can never find anything that satisfies the (conflicting) constraints that already exist. At that stage, though, I'd say the user probably needs to just uninstall stuff to fix their environment first.

uranusjr commented 4 years ago

At that stage, though, I'd say the user probably needs to just uninstall stuff to fix their environment first.

I feel this would be a problem for a lot of pip users. Many users just install packages globally (way too many IMO TBH), and it would be quite difficult for them to figure out what they can uninstall in the face of an environment conflict.

Maybe we should roll out pip autoclean and/or pip uninstall --deps [package] before the new resolver becomes the default, as a relatively easy way to clean the environment up. This would avoid peole being stuck between the two timelines.

(Interlinking to #6536 since this is a rollout consideration and not technical.)

pfmoore commented 4 years ago

I feel this would be a problem for a lot of pip users.

Quite probably. That would imply that we have to do approach (1) from the original post, though, doesn't it?

uranusjr commented 4 years ago

Approach 1) would be the only choice if we don’t build extra tooling to help users clean up the already installed packages. But we probably can (and want to?) build those tools. I think the dependencies-of-installed-packages-as-constraints would be fine as long as the tools land before we force everyone to adopt the new resolver.

pfmoore commented 4 years ago

Fair point, let's discuss further on #6536.

brainwane commented 4 years ago

To be abundantly clear, I think this is going to be one of the issues that can cause "oh the new resolver actually breaks my setup, leaving a sour taste in my mouth" situation for our users, during the new resolver's rollout.

@pfmoore @pradyunsg @uranusjr Do we need to resolve this before the 20.2b2, or do you think it is sufficient to put a "known bug" item about it in the "migrating to the new resolver" guide?

pfmoore commented 4 years ago

I think we need to decide what to do before the beta. We're mostly stalled by indecision here. Specifically, should the long-term behaviour be:

Pip considers all installed packages, and will not allow installation of an inconsistent set of versions to be present.
Pip only considers the packages being installed, and may break installed packages.

Once we've decided on the desired behaviour, I think that exploring the consequences can be documented as an "open issue" - and if we choose (1), I think it's OK to document that it's "not implemented yet" in the migration guide.

brainwane commented 4 years ago

We discusssed this in our meeting last week and agreed to go with approach #2: "Pip only considers the packages being installed, and may break installed packages." Or, in more words: Pip will only consider the packages being installed, and may break installed packages. It will not guarantee that your environment will be consistent all the time. If you install x and then y, it's possible for the y you get to be different than it would be if you had run "install x y" in a single command.

@pradyunsg is also interested in thinking about directing users who might want strict mode, so they can provide "I want that" inputs on that in the beta. This could involve the CLI printout, or something in documentation. I saw there was a TODO:

Pradyun to file a tracking issue, documenting this and what we're going to do here.

Pradyun, did you already do this?

brainwane commented 4 years ago

Referring to this in #8491.

nlhkabu commented 4 years ago

FYI, we have created a survey to collect user feedback on this issue: https://bit.ly/2ZqJijr

We can link to this from the error message - see #8546

nlhkabu commented 4 years ago

Cross posting this comment from @pfmoore from Zulip, so it does not get lost:

One follow-up that I think would be worth asking is are users OK with pip not being able to install at all if the user's environment is already inconsistent (however that may have happened). The context for this is the discussion following on from this comment on the tracker. I'm sort of OK with telling users they have to uninstall stuff until their environment is not-broken, but without giving them help in debugging what's wrong (and I don't know what help I could give them!), that seems a bit aggressive...

Also, we might need to be a bit more nuanced over "an error explaining that the upgrade would cause incompatibilities" as we need to consider that in some cases it might not be the upgrade causing incompatibilities, but rather the incompatibilities are already there and pip can't cope with them...

My next steps:

help @pradyunsg publish survey in warning message (see #8546)
review / publish / share results from survey
conduct follow up research on the points raised by Paul above

nlhkabu commented 3 years ago

@brainwane re: follow up.

is done
can now we done as we have sufficient data from the survey (673 responses to date) to make a recommendation.
can be addressed at the same time as 2, as we have lots of user comments from the survey that will help us address Paul's concerns

I will speak with the rest of the UX team to see how we can prioritise this within our workload.

nlhkabu commented 3 years ago

For the 20.3 release, we would like to add a warning to tell users that pip does not take into account currently installed packages when running --upgrade.

Example input: pip install --upgrade foo

Suggested output:

Warning: pip will upgrade foo without taking into consideration the other packages that you already have installed. This may cause dependency conflicts.

fyi @pradyunsg

Sidenote: our user research tells us that users want the way that this works to change - they want pip to take into account the current packages that are installed, and not run the upgrade if it will introduce dependency conflicts. I will open a separate ticket to request this.

jtamagnan commented 3 years ago

Warning: pip will upgrade foo without taking into consideration the other packages that you already have installed. This may cause dependency conflicts.

This error message adds a lot but it might be nice to add a bit more. What does it mean for pip to not take it into consideration? Will pip upgrade with no consideration of what it used to be? Will pip not consider them at all and not upgrade them?

An alternative might be:

Warning: pip will upgrade foo without modifying other packages that you already have installed. This may cause dependency conflicts.

brainwane commented 3 years ago

@jtamagnan I appreciate you wanting to help, but your suggested change is inaccurate, since currently pip may modify already- installed packages. pip may leave them alone and may upgrade them; at the time pip is emitting this message, it does not know whether it will end up doing the former, the latter, or both! If you can suggest a concise way to convey that, I'd love to hear it.

jtamagnan commented 3 years ago

My apologies for giving an alternative that is wrong on all accounts. I hope thought that my confusion might help to show how the original message might be misinterpreted. Let me try a few more:

Warning: pip will upgrade foo. This may lead to foo's dependencies being modified which may cause dependency conflicts.

Warning: pip will upgrade foo. This may lead to other packages being modified which may cause dependency conflicts.

If these don't work hopefully my failure to find a better message does not mean that no one else will try to find one.

brainwane commented 3 years ago

Thanks @jtamagnan! I know our user experience experts (in particular @nlhkabu and @ei8fdb) will take your suggestion into account as they iterate and improve the message.

nlhkabu commented 3 years ago

Thanks for your feedback @jtamagnan !

Maybe we can go with:

Warning: when upgrading foo, pip may also need to update its dependencies. As pip does not currently take into account the other packages that you already have installed, this may cause dependency conflicts.

pradyunsg commented 3 years ago

Do we want this message to be printed unconditionally? I feel like it'll serve as noise beyond the "first time you see it" and acts toward conditioning users to ignore warning messages from pip.

uranusjr commented 3 years ago

My instinct is we should only display this if pip actually upgraded a dependency.

pfmoore commented 3 years ago

Furthermore, won't people read this and think that pip is saying --upgrade is broken, resulting in them going on a (futile) hunt for a command that upgrades without "causing dependency conflicts"?

uranusjr commented 3 years ago

Tangentally related: I was thinking about this just earlier today. Say user has Django 3.0 already installed, and want to upgrade to 3.1, but does not want pip to upgrade any of Django 3.0’s dependencies.

The command that works in 20.3 would be pip uninstall -y django && pip install "django>3.0". This installs the latest Django that does not conflict with any currently installed distributions, or errors out if there’s no such version available.

Maybe we should put this somewhere as a suggested alternative to --upgrade.

pfmoore commented 3 years ago

Actually, the timing of this is possibly unfortunate. We are releasing the new resolver and want to give the message that it is stricter but that is good. And yet we are simultaneously adding a new warning that says that doing pip upgrade can cause conflicts (which people will not necessarily read as a reminder that it’s always been like that and may see as a new problem). I feel like people could find the overall impact confusing, and detract from the positive message about the new resolver.

pypa / pip

Warn users about dependency conflicts when updating other packages #7744