Add a resolver option to use the specified minimum version for a dependency

dhellmann commented 4 years ago

What's the problem this feature will solve?

I would like to be able to install my project using its "lower bounds" requirements and run the test suite to ensure that (a) I have those lower bounds specified properly and (b) the tests pass.

Describe the solution you'd like

A new command line option --prefer-minimum-versions would change the resolver behavior to choose the earliest version supported by a requirement specification. For example if versions 1.0 and 2.0 of package foo are available and the specification is foo>=1.0 then when the flag is used version 1.0 would be installed and when the flag is not used version 2.0 would be installed.

Large applications such as OpenStack have a lot of dependencies and verifying the accuracy of the complete set is complex. Providing a way to install the earliest set of packages expected to work would make this easier.

Alternative Solutions

The existing constraints file option does help, but building a valid constraints file is complicated.

Additional context

There is a recent discussion about this need on the openstack-discuss mailing list.

I will work on the implementation.

uranusjr commented 4 years ago

I like this idea. Preferring a minimum version is a valid tactic in many scenarios, and is even the default in some package managers. I don’t think it’s a good idea to default to the lowest possible version, but it’s reasonable to have it as a configurable option.

The tricky part is how to expose the functionality to the user though. Having a separate --prefer-minimum-versions feels wrong to me, since the flag doese not make sense in some cases. Maybe this should be included as a part of the --upgrade-strategy redesign process. For example, introduce a new --strategy flag as the replacement, and have this as one of the possible values.

dhellmann commented 4 years ago

I don't see the upgrade_strategy argument to the new resolver being used at all. Is that part of other work that someone else is doing?

It does seem to make sense to fold the behavior change into the strategy, as long as it isn't something that we would want to combine with other strategies. It looks like other strategies include only-if-needed, eager, and to-satisfy-only. I'm not sure what the distinction is between only-if-needed and to-satisfy-only. During an upgrade, I could see someone wanting to say the equivalent of "update if you have to, but move to the oldest possible version you can". How would someone express that if "prefer-minimum" is a separate strategy from those other options?

pfmoore commented 4 years ago

I don't see the upgrade_strategy argument to the new resolver being used at all.

It isn't yet. But we expect to add that soon (subject to some questions over how well the existing strategies fit with how the new resolver works).

In terms of the new resolver, "eager" means "don't prioritise already-installed versions over other versions". And "only-if-needed" would prioritise already-installed versions. The "to-satisfy-only" option isn't really relevant as it's more of an "internal" state (its behaviour is a bit weird, so I won't confuse things by explaining here).

Minimum version would be easy enough to specify by preferring older versions over newer ones.

The big question, as I see it, is how to let the user specify their intent correctly. Suppose there's some dependency in the tree that doesn't specify a minimum version. Would you want to install version 0.0.1 (or whatever ancient version) in that case? And surely "upgrade to minimum version possible" is just "don't upgrade" - the currently installed version is pretty much by definition the minimum version allowed...

So I think that technically, this is relatively straightforward to implement, but we'd need help in designing a user interface, in terms of command line options, to allow the user to make meaningful requests, while not turning things into a complex mess that no-one can understand :-)

dhellmann commented 4 years ago

I don't see the upgrade_strategy argument to the new resolver being used at all.

It isn't yet. But we expect to add that soon (subject to some questions over how well the existing strategies fit with how the new resolver works).

OK. I was asking because I wasn't sure how to fit this in. I can add a --strategy option to replace the --upgrade-stragey option as @uranusjr suggested, and ensure the strategy is passed to the resolver as upgrade_strategy. After that, I'm less sure what to do. :-)

Is the plan to define some classes to represent the behaviors of the strategy so that the code can call methods instead of checking string literals in different places? Or do you think the strategies would be completely encompassed by the resolver itself, so the string literals would be fine? Either way, I expect we would need some changes in resolverlib, too. How much of the definition of the strategies should be owned by the library instead of pip itself?

If there's anything written down that I can look at to come up to speed, feel free to respond just with links. I have a little time this week so I'd like to help, if I can.

In terms of the new resolver, "eager" means "don't prioritise already-installed versions over other versions". And "only-if-needed" would prioritise already-installed versions. The "to-satisfy-only" option isn't really relevant as it's more of an "internal" state (its behaviour is a bit weird, so I won't confuse things by explaining here).

Minimum version would be easy enough to specify by preferring older versions over newer ones.

The big question, as I see it, is how to let the user specify their intent correctly. Suppose there's some dependency in the tree that doesn't specify a minimum version. Would you want to install version 0.0.1 (or whatever ancient version) in that case? And surely "upgrade to minimum version possible" is just "don't upgrade" - the currently installed version is pretty much by definition the minimum version allowed...

I would say, yes, install 0.0.1. I consider not specifying a minimum version a bug in the packaging specs, and if 0.0.1 doesn't work then the tests run with this new flag set would expose the bug. I realize other folks may not have quite that strict an interpretation, though. :-) I guess saying that the strategy would install the "earliest version that can be found" in that case would at least be clear and easy to understand. Maybe that means a better name for the strategy is something like "earliest-compatible"?

So I think that technically, this is relatively straightforward to implement, but we'd need help in designing a user interface, in terms of command line options, to allow the user to make meaningful requests, while not turning things into a complex mess that no-one can understand :-)

I agree, the implementation in #8086 was quite straightforward, and the harder part will be the UI and internal API changes.

dhellmann commented 4 years ago

I've joined #pypa-dev on freenode as dhellmann, in case anyone wants to chat about this with less latency. I can summarize anything said there here in the ticket for easier reference later.

dhellmann commented 4 years ago

Is the plan to define some classes to represent the behaviors of the strategy so that the code can call methods instead of checking string literals in different places? Or do you think the strategies would be completely encompassed by the resolver itself, so the string literals would be fine? Either way, I expect we would need some changes in resolverlib, too. How much of the definition of the strategies should be owned by the library instead of pip itself?

As an example of what I mean here, I could see a Strategy class hierarchy defining a method get_preferred_candidate() to implement the PipProvider method get_preference() so the provider doesn't have to be aware of all of the strategies. The Strategy would also need to define a method like sort_candidates() to be used by resolvelib.Resolution._attempt_to_pin_criterion().

I'm sure other strategies would cause the API for Strategy to need to expand in other ways.

pfmoore commented 4 years ago

I've joined #pypa-dev on freenode as dhellmann, in case anyone wants to chat about this with less latency.

We're discussing resolver things on Zulip rather than IRC.

As an example of what I mean here, I could see a Strategy class hierarchy defining a method get_preferred_candidate() to implement the PipProvider method get_preference() so the provider doesn't have to be aware of all of the strategies.

The get_preference method isn't related to this. It's a "which thing should we check next" tuning knob to control the internal progress of the resolver. The method that matters here is find_matches (and specifically the order of the candidates it returns).

I'm planning on looking at this myself tomorrow, as I've had upgrade strategies on my task list for a week or so now :-) At the moment, I'm a fairly strong -1 on strategy classes - I feel that they'd likely just be over-engineering at the moment. IMO we've already got probably more classes in the new resolver code than we really need...

But I've shut down my "working on pip" PC for the day now, so I'll refrain from going into any further detail just from memory.

dhellmann commented 4 years ago

I've joined #pypa-dev on freenode as dhellmann, in case anyone wants to chat about this with less latency.

We're discussing resolver things on Zulip rather than IRC.

Ah. I don't know what that is. The docs pointed me to IRC. How do I get to the right place in Zulip?

As an example of what I mean here, I could see a Strategy class hierarchy defining a method get_preferred_candidate() to implement the PipProvider method get_preference() so the provider doesn't have to be aware of all of the strategies.

The get_preference method isn't related to this. It's a "which thing should we check next" tuning knob to control the internal progress of the resolver. The method that matters here is find_matches (and specifically the order of the candidates it returns).

OK. That wasn't what I found when looking at the implementation I've already one, but I'll take a look at find_matches().

I'm planning on looking at this myself tomorrow, as I've had upgrade strategies on my task list for a week or so now :-) At the moment, I'm a fairly strong -1 on strategy classes - I feel that they'd likely just be over-engineering at the moment. IMO we've already got probably more classes in the new resolver code than we really need...

OK, I can understand that. There do seem to be a lot of different parts working together and https://github.com/dhellmann/pip/commit/de6e70d3abcc5638777e4ce169b4061b9c75ac18 didn't come out particularly clean. :-)

dhellmann commented 4 years ago

I've joined #pypa-dev on freenode as dhellmann, in case anyone wants to chat about this with less latency.

We're discussing resolver things on Zulip rather than IRC.

Ah. I don't know what that is. The docs pointed me to IRC. How do I get to the right place in Zulip?

Nevermind, found it.

pradyunsg commented 4 years ago

@dhellmann Glad to hear that you're willing to help out with the implementation. ^>^

To set expectations early, this is a feature request for adding new functionality to pip. As per pip's release cadence, the next release with new features would be in July (pip 20.2) so arguably, IMO there's no hurry toward implementing this.

Further, as you've discovered, implementing this feature will be significantly easier to do with the new resolver's architecture than with the old resolver, however, our priority currently is to get the new resolver to feature parity with the existing resolver and roll it out to become the default this year. IMO implementing new features related to dependency resolution is going to be significantly lower priority for us, in the short term, while we work on replacing a core component of pip.

dhellmann commented 4 years ago

I understand the priorities, and am not in a particular hurry for a release. That said, I have more time to work on pip in the next few days than I’m likely to have later. So let’s see where we get with things as you have time, too.

Are any of the higher priority tasks things I might be able to help with?

pradyunsg commented 4 years ago

This is pointing in a different direction from dependency resolution, but https://github.com/pypa/pip/issues/4625 would be great to solve and would be a significant usability improvement for users (especially ones that are on Linux and using the system Python with sudo).

pohlt commented 2 years ago

I would be very interested in this feature. Any updates on its implementation status?

pfmoore commented 2 years ago

Basically, no-one is currently working on it, and the discussion in this thread is all there is. The biggest questions remain how to design a user-friendly interface for this, and make the behaviour intuitive for people (for example, I'm still not at all sure that if no lower bound is specified somewhere in the dependency tree, getting a version from 15 years ago and progressively working forward through the versions until you reach something that works, is a good user experience).

But nothing will happen unless someone is willing to do the design and implementation work, so any such discussions are pointless at the moment.

pohlt commented 2 years ago

Thanks for the update. There was an initial PR #8086 from @dhellmann which didn't get a lot of attention or feedback from the package owners. Busy times, I know. 😉

So without at least some commitment from the package owners, nobody will go down the same road and starve again, I guess. Well, at least I wouldn't.

If no lower bound is given, just try the 15 years old version and watch how everything goes up in flames. I don't think that's a likely scenario. Anyone who knows about the "minimum version" option and activates it, will be clever enough to know that a missing lower bound is bound to break. Or you could simply stop and tell the user to add a minimum version.

uranusjr commented 2 years ago

I think the discussion in this thread already showed the maintainers do not object to the idea at all. But going forward with an actual implementation, it needs to be first discussed to resolve the design decisions. An implementation without that design discussion is destined to wilt, because the implementation has no way to be accepted without some kind of consensus, no reviewers would spend volunteer time reading code that is likely going to be thrown away. If you want to drive the feature forward, you need to consider the design issues raised in this thread and come up with a piece to explain what you have in mind, and more importantly, why you feel that is the correct design for the problem at hand, and then you will get the "committment" you are looking for.

pohlt commented 2 years ago

Could you please elaborate on what you don't like about #8086? To me, without any knowledge about the overall design philosophy of pip, it looks like a clean and minimal PR lacking test coverage.

pfmoore commented 2 years ago

I've already said that I dislike blindly getting the oldest version when there's no lower bound specified. I haven't had time to research the precise details of the sort of failure case I'm imagining, but consider a set of requirements that, somewhere 5 or 6 levels down in the dependency tree, says something like numpy != 1.21.1. And you're on Python 3.9 but have no C compiler. Then pip will try to build about 90 source releases of numpy versions that have no Python 3.9 wheels, before finding the oldest version with a wheel. That's going to be horribly slow - and because the dependency is way down in the tree, may not be easily fixable (or even identifiable) by the user.

I think any solution should deal with situations like this reasonably cleanly, but I don't know how that would work. We've had enough complaints that the standard resolve ordering results in long install times where the user can't work out what's taking the time (when it's relatively "obvious" to people with a lot of experience with the resolver) to make me think that this is not going to be as rare a situation as you hope it will be...

thomasf commented 2 years ago

Just omitting an warning message while building that a depednency lacks a specified minimum version is maybe enough? Or even maybe making it an hard error forcing the user to specify the minimum version themselves? I'm fine with the resolver just quitting with a hard error instead of trying too hard for this mode.

pohlt commented 2 years ago

As I have proposed above, pip could simply refuse to run (or stop execution) if no lower bound is given for any requirement and it is being run in the --minimum-version mode. Of course, a good warning/error message would be appreciated.

For me, this requested feature actually is about trying to break things in the sense that I want to test my minimum requirements.

It could also make sense to restrict this "minimum version" rule to direct dependencies (i.e. not to sub-dependencies). This would also mitigate the problem that a sub-dependency has no lower bound and would stop pip (see proposal above) or make it horribly slow..

uranusjr commented 2 years ago

I think just outright refuse to run if any of the requirements misses a lower bound is reasonable (not necessarily user-friendly since that restriction will also apply for all transitive packages, but that's something we can build tooling around). The only thing I don't like is the --prefer-minimum-versions flag, also mentioned above.

pohlt commented 2 years ago

Ok, so let me summarize:

The PR in general is fine.
You don't like --prefer-minimum-versions. Proposals would be highly appreciated.
Still missing:
- Tests
- Stop execution if lower bound missing

The more I think about it, the more I like the idea of applying this "minimum version" rule only to direct dependencies. If applied for all transitive packages, I'm also testing their reasonable choice for a lower bound, which is beyond the scope of my tests. What do you think?

thomasf commented 2 years ago

The more I think about it, the more I like the idea of applying this "minimum version" rule only to direct dependencies. If applied for all transitive packages, I'm also testing their reasonable choice for a lower bound, which is beyond the scope of my tests. What do you think?

I am not sure I fully understand what this means.

My reason for wanting this feature would be to be able to have a single requirements.txt for an application that would install some some form of predictability given the same os/environment without additional tools and lock/freeze files. ( my reasons outlined here https://github.com/pypa/pip/issues/10207#issue-952767638 )

I'm not sure that having different versioning rules at different dependency dephts would be surprising (which means not good) and probably hard to understand for some users. Dependency resolvers are hard for many users as it is without intentionally making them even more complicated.

pfmoore commented 2 years ago

The more I think about it, the more I like the idea of applying this "minimum version" rule only to direct dependencies.

I'm not sure what you mean by "direct" dependencies. Requirements stated on the command line and/or requirements file? Requirements declared in their dependency metadata? Both? This seems like a weird rule - it means that if you copy a requirement from deeper in the dependency tree and add it to the command line (something we occasionally advise people to do to address complex backtracking issues) that would radically change what gets installed. IMO, that's flat-out wrong (writing requirements in a different order may change performance, but shouldn't change the end result).

pohlt commented 2 years ago

I'm using the definitions here.

My user story: As a package developer, I want to make sure that the lower bounds of the packages I define as direct dependencies (in setup.py or pyproject.toml) make sense, i.e., that my package passes all the tests with the lowest version of dependent packages installed. [Side remark: I cannot directly influence transitive dependencies other than making them direct dependencies.] My golden rule for the selection of a lower bound for a direct dependency is "as low as possible, as high as necessary" to give users of my package the most flexibility. The requested feature would allow for an easy test of these lower bounds.

This leaves room for interpretation. For instance, it does not define which version for transitive dependencies should be chosen. The initial discussion and the PR were assuming that all packages (direct and transitive) should be the lowest possible version. What I am proposing now (and maybe it doesn't make sense) is to use the standard strategy (highest available version) for transitive dependencies, because I cannot directly fix the lower bounds for transitive dependencies.

thomasf commented 2 years ago

I cannot directly influence transitive dependencies other than making them direct dependencies

You can use a constraints files ( https://pip.pypa.io/en/stable/user_guide/#constraints-files ) which probably are what I would want to use to control upgrades of transitive dependencies in general when minimal version selection is used in a project.

pohlt commented 2 years ago

What I meant with "influence": If a direct dependency of my package messed up their lower bounds or didn't define lower bounds at all, I cannot raise their lower bounds (literally changing their setup.py). I can make my tests work again by using constraint files, as you mentioned, but pip could also just use the regular strategy for transitive packages.

This is off-topic, but if pip had a well-documented API, it could be rather simple to influence its inner workings (like version selection strategy). Maybe there is and I just couldn't find it.

pfmoore commented 2 years ago

This is off-topic, but if pip had a well-documented API

I sort of agree (but not in the way you mean, I suspect 😉) The use case you seem to be describing sounds like you want pretty fine control over a lot of what pip is doing, to make sure you test what you're trying to test. That's a scenario that's not well served by a massive monolithic program like pip. What you really need (IMO) is a set of smaller tools and/or libraries that let you compose the mechanism you want from individual well-tested pieces.

Basically, all of the standards work we've been doing for years is intended to try to enable that sort of separation of concerns. It's far from complete, but many of the parts are in place. What hasn't happened yet, is for an ecosystem of libraries to get developed around that (it's happening, with things like packaging, build and installer, but there are still key parts that no-one has addressed yet, or which only exist in "proof of concept" form). But there's no way pip is ever going to be that sort of library - it was designed as a monolithic application, and the internals are not suitable for exposing as a library (and even if they were, we don't have the manpower to even consider putting everything else on hold for long enough to rewrite everything as reusable APIs).

So unfortunately, until more people start writing the pieces needed to build something like pip from reusable components, you won't be able to influence the resolve/install process to the level you want to. (And yes, as such libraries come into existence, pip will likely switch to using them rather than having to maintain our own implementations).

pohlt commented 2 years ago

It deeply concerns me that with all the funding the PSF gets, the PyPA still seems to be understaffed/underfunded. Packaging is such a central concerns for any programming language that it should not rest mainly on the shoulders of volunteers like you. I don't know any details if and how much PSF is supporting PyPA, so my concerns might be completely unjustified.

If I understood you correctly, you think "my" (actually raised by Doug) use case is too specific to make it into pip. Is that correct? No hard feelings, I'm just trying to avoid to spend even more time on a lost cause.

pradyunsg commented 2 years ago

It deeply concerns me that with all the funding the PSF gets, the PyPA still seems to be understaffed/underfunded. Packaging is such a central concerns for any programming language that it should not rest mainly on the shoulders of volunteers like you. I don't know any details if and how much PSF is supporting PyPA, so my concerns might be completely unjustified.

Well, as far as I know, more $$$ from the PSF has gone toward directly funding packaging-related projects than CPython itself -- mostly through the PSF acting as a fiscal entity for targetted grants and the work done by the Packaging-WG of the PSF.

From my understanding, the problem isn't that the PSF won't direct funds toward packaging projects. Rather, it is that there simply isn't a lot of funding that the PSF gets, to direct toward software development (especially given the impact that PSF's other investments, like the grants programme, are able to achieve). This is something that many people are working toward improving and I'm optimistic that things will get better over time. In fact, right now, there's two funded-via-targetted-sponsorship roles: A Developer-in-residence for CPython and Packaging Project Manager (the later is sponsored by my employer). :)

The PSF's sponsorship programme is one way to support the PSF's ongoing endevours on these fronts. If you're focused on packaging, there's the Packaging-WG of the PSF who are more than happy to talk about funding for Python Packaging improvements. :)

pfmoore commented 2 years ago

@pradyunsg has commented on the funding side of things.

If I understood you correctly, you think "my" (actually raised by Doug) use case is too specific to make it into pip. Is that correct? No hard feelings, I'm just trying to avoid to spend even more time on a lost cause.

I don't think the use case is too specific. I just think it needs a bunch of work to agree details and move it to completion. The pip maintainers are unlikely to do this work ourselves (limited bandwidth, plus it simply doesn't scratch an itch that we have). So it needs someone like yourself from the community to take that on.

You seem to feel that the pushback you're getting is intended to dissuade you from continuing. It's not - far from it, this is precisely the debate that needs to happen and be resolved if the proposal is to move forward. The biggest problem with getting community contributions for pip is that contributors are optimistic that "it's simple to fix", and then get scared off when difficult questions, or use cases that they don't themselves care about, come up and they don't want to address them. That's perfectly fine, of course - everyone is offering time and energy freely here - but more proposals stall because of that than for any other reason, in my experience. Maintaining a project used by millions of people is hard, and often frustrating. But when something comes together, it's really rewarding, too 🙂

pohlt commented 2 years ago

Thanks for the motivation speech. 😀 To be honest, the discussion sometimes felt a little bit like dissuasion and I get it: feature creep is threat for any project.

Coming back to my list:

The PR in general is fine.
You don't like --prefer-minimum-versions. Proposals would be highly appreciated. 📣
Still missing:
- Tests
- Stop execution if lower bound missing

Would you agree with this list? And if not, what is broken / missing?

pfmoore commented 2 years ago

I don't dislike --prefer-minimum-versions, so I don't feel the need to offer an alternative proposal 🙂 If you just mean the name of the option, I'm personally not too interested in bikeshedding.

Also under "still missing", a decision on whether we do "prefer minimum" for the first level of dependencies, vs the original proposal of doing it for everything - do we pick one (and leave the other use cases unsupported), or do we offer both options, or what?

Otherwise, your list is about right. I'm sure more details will come out as you start implementing things...

thomasf commented 2 years ago

I would think it's really weird to have a flag named --prefer-minimum-versions or something similar which does not default to always prefer minimum versions. To me that is flag that is lying about it's function.

Maybe an optional depth value --prefer-minimum-versions=[depth] (or a separate flag) where depth can be specified so that --prefer-minimum-versions=1 means what @pohlt wants to do? This way you can run your project tests at depth 1-5 of minimal versions if you like even if that might require constraints files.

I understand that this might be something you want for testing but it feels counter intuitive and complicated to have dependency resolving working differently at different levels. If you want to test lower bounds the lower bounds should IMO actually be specified. I get that it is useful though.

layday commented 2 years ago

Wasn't the concern that an unbounded transitive dependency could trigger local builds from sdists from time immemorial (just to extract the package metadata)? Why would we want to predicate this on depth, rather than bound...edness, i.e. --prefer-minimum-versions={any,bounded}?

If I'm testing my package for compatibility with older versions I absolutely do want to test that against the oldest transitive dependency installable, and having to specify something like depth=99 would not be intuitive.

thomasf commented 2 years ago

Wasn't the concern that an unbounded transitive dependency could trigger local builds from sdists from time immemorial (just to extract the package metadata)? Why would we want to predicate this on depth, rather than bound...edness, i.e. --prefer-minimum-versions={any,bounded}?

If I'm testing my package for compatibility with older versions I absolutely do want to test that against the oldest transitive dependency installable, and having to specify something like depth=99 would not be intuitive.

It makes sense from the point of a library that wants to test itself against someone who is not using --prefer-minimum-version at all because they would get maximum versions below their own install_requires constraints. Even though I find it superior --prefer-minimum-version will probably never be the way most people choose to install their project dependencies

layday commented 2 years ago

It's not a question of how people choose to install their dependencies, it's a question of vestigial, incompatible lower bounds. If I pip install -U foo, which depends (transitively) on bar, I won't be updating bar if foo doesn't require a higher version of bar than I currently have installed. But foo's tests will pass because in CI the latest version of bar is being installed. I don't think a depth of ~~zero~~ one is of much use to anyone.

uranusjr commented 2 years ago

I don't think bounding the minimum version selection logic to any depth is reasonable. The depth of a dependency is not especially meaningful. It is a good generic indicator of some edge cases, but shouldn't be depended on by big behavioural changes like this. This is especially in Python packaging and pip: as an example, if you put dependencies in a requirements.txt, pip install -r requirements.txt --prefer-minimum-version=1 does what you expect, but if you put them in a setup.py, pip install . --prefer-minimum-version=1 would suddenly start selecting the maximum versions because now only the package in the current directory is at level 1. This behaviour would surprise a lot of users.

Another reason limiting the selection logic to the first level (or any level) would likely not work in practice is that a lot of depended packages don't necessarily have the correct lower bound specified. The rationale of having a minimum version selection logic is, from my understanding, an intention to answer the question how far back in history does my code can actually run against, and having a transitive dependency incorrectly specified is as problematic as a direct dependncy, since it causes incorrect behaviour in your code just the same. Of course, the dependencies you actually directly use in your code should be responsible for specifying a lower bound correctly, but alas, that is simply not the case in practice, and utlimately a dependant must be responsible for the aggregation of all its direct and transitive dependencies, at any level.

henryiii commented 2 years ago

Of course, the dependencies you actually directly use in your code should be responsible for specifying a lower bound correctly

I think this has to be the case, and you should not be adding transitive dependencies to your requirements. If you depend on something and it lies about its lower requirement bounds, that's their bug, and if they can't/won't fix it, you should consider looking into a better library (same thing if they add unreasonable dependency upper bounds).

I don't think a library is responsible for it's transitive dependencies. For example, say you depended on setuptools_scm. You found you needed some minimum (say 3.2) and that it didn't limit toml, so you add a toml>0.3 dependency in your library, even though you never use toml yourself. Then, when setuptools_scm updated to version 6 (or 6.2), it dropped the toml dependency and added a tomli dependency. You now require both, and you might not even be able to tell you don't need toml, since you've lost that it was only required for a dependency, and wasn't a real dependency of yours.

uranusjr commented 2 years ago

A library should not put those additional requirements in their package metadata, but I was not talking about that. I'm talking about finding out the answer how far back dependencies the code can be run against, and the answer for transitive dependencies are as needed as direct ones. That code may not even be a library; an application should pin its dependencies lower bound as well so the environment can be properly upgraded when they are deployed, and it's meaningless to say "the dependecy should fix its requirements" in ths situation. The application needs to be run now, and the requirements specified by the application only applies for this particular deployment, so the only and perfectly sensible solution is to add that lower bound to a transitive dependency.

pohlt commented 2 years ago

tl;dr

"lowest version for all packages" might be the gold standard test, but is likely to fail
"lowest version only for direct dependencies" could be a valuable intermediate step towards this gold standard.
I don't see a use case for anything in between (i. e. `level=5' ).
It seems "direct dependency" is not as clearly defined as I thought. @pfmoore, could you please elaborate?

(Much Too) Long Version

IMHO, both tests (using lowest version only for direct dependencies and lowest version for all dependencies) have their respective use case.

Lowest version for all dependencies

The gold standard for my package's tests would be to install the lowest version for all (i.e. also transitive) dependencies. I suspect, that a lot of packages don't test their lower bounds, so it is very likely that the tests of my package will fail just because a (potentially transitive) package has broken (or missing) lower bounds.

If my tests fail in this scenario, I see three options:

Increase the lower bounds of my direct dependencies (which one?) until I pass the tests. Potentially, this doesn't work, because even the latest version of one dependency has broken lower bounds and I would increase the probability of a version conflict because of my stricter version limits.
Make the broken transitive dependency (which one?) a direct dependency and pin it to a higher version. A bad idea for several reasons which have already been discussed.
Ask the maintainers of the defective (transitive) dependency (which one?) to fix the lower bounds for all of their potentially old releases. This is unlikely to happen.

All three options involve a lot of investigation which (transitive) dependency is the root cause for my failing tests. And all three options are either a bad idea or unlikely to work.

Lowest version only for direct dependencies (and regular strategy for transitive packages)

This would avoid the challenges mentioned above, but - of course - is a much weaker test, because some user might have installed old versions in her/his venv which are not updated as required because of broken lower bounds. Still I think this would be a valuable (intermediate) step until lower bounds checking is a widely adopted test for most packages.

AOB

I don't see a use case for anything in between both discussed extremes, like specifying a numeric level.
Naming things is hard, so I would postpone this discussion until we agree on the actual functionality.
@pfmoore made a comment that the definition of "direct dependency" is not as clear as I assumed it would be. For me, a direct dependency is any Requires-Dist package in the METADATA of my package. Too naïve?

Asday commented 2 years ago

I suspect, that a lot of packages don't test their lower bounds, so it is very likely that the tests of my package will fail just because a (potentially transitive) package has broken (or missing) lower bounds.

That is not pip's problem, in fact one could argue it's been caused by pip's behaviour of not selecting minimal versions, causing lower bound breakages to go unnoticed.

If my tests fail in this scenario, I see three options:

Combination of 2 and 3. You have 2 in requirements-shame.txt while your issue is open to the maintainers for 3. If they're too slow about it, fork it and fix it yourself.

This is unlikely to happen.

This remains not pip's problem.

All three options involve a lot of investigation which (transitive) dependency is the root cause for my failing tests

Assuming your dependencies have good test suites and you're not relying on undocumented behaviour, (and that applies recursively), it's not going to be that difficult at all. If they don't, you have a bigger problem - your build only works coincidentally.

And all three options are either a bad idea or unlikely to work.

Once more, it's not pip's problem. We need to ourselves make sure our build works not by coincidence, and also hold our library maintainers to the high standards of "please write a test suite" and "please make sure the test suite passes".

Still I think this would be a valuable (intermediate) step until lower bounds checking is a widely adopted test for most packages.

I would like to disagree as it involves intuition on pip's behalf, and magic. Say a transitive dependency in my project becomes a direct dependency. Now the rules for its dependency version resolution have changed, even though as the programmer I never intended to change the build.

I think a valuable intermediate step would be to have min sat as an option but not on by default, and because I'm terrible I'd personally also like a deprecation schedule whereby the option would turn on by default in, say, two years, and then cease to be optional two years after that.

We managed to go from Python 2 to 3, which was a huge change which didn't immediately benefit everyone. I'm sure we can manage to file PRs against packages who misbehave against the lower bounds of their dependencies.

pohlt commented 2 years ago

... not pip's problem ...

Well, all of this is not pip's problem and I'm not blaming pip for anything. 🤔

pip is offering a service to users ("install Python packages") and I'm supporting the idea that this service could be optionally modified to support package developers with testing the lower bounds of their (direct) dependencies.

I would like to disagree as it involves intuition on pip's behalf, and magic. Say a transitive dependency in my project becomes a direct dependency. Now the rules for its dependency version resolution have changed, even though as the programmer I never intended to change the build.

Changing from a transitive dependency to a direct dependency is a deliberate action of the package developer and typically, she has good reasons for that. I don't see how this change requires intuition or involves magic on pip's or anyone's side.

I think a valuable intermediate step would be to have min sat as an option but not on by default, and because I'm terrible I'd personally also like a deprecation schedule whereby the option would turn on by default in, say, two years, and then cease to be optional two years after that.

If you are suggesting that "min sat" should be an option (as in "optional and not default") and stay optional, I couldn't agree more and this has been the original proposal since April 2020. But I might be misinterpreting this paragraph.

pradyunsg commented 2 years ago

I've labelled this issue as an "deferred PR".

This label is essentially for indicating that further discussion related to this issue should be deferred until someone comes around to make a PR. This does not mean that the said PR would be accepted as-is -- that decision has been deferred until the PR is made.

pohlt commented 2 years ago

@uranusjr wrote:

But going forward with an actual implementation, it needs to be first discussed to resolve the design decisions. An implementation without that design discussion is destined to wilt...

I think we are still in the design decision phase (or even in the "does this make any sense" phase) and I agree with @uranusjr that an additional PR at this stage has a low probability of being accepted in the end.

Classical Catch-22.

pfmoore commented 2 years ago

Classical Catch-22.

It is, but as a volunteer project, we try to minimise the drain on our (very limited) developer time by not having extensive design decisions on topics where there's no guarantee that anyone is motivated enough to actually implement the feature once a design has been finalised. The "deferred PR" label indicates this - until someone has enough motivation to submit a PR implementing at least a basic design, we treat the feature request as "maybe nice to have, but not likely to be worth investing more time into".

It's not blocking progress, just asking someone to confirm (by submitting a PR) that they intend to work on implementation before we sink too much time into design discussions. An initial PR doesn't have to be perfect, or even complete, and it can be reworked if the chosen design turns out to be wrong.

davegaeddert commented 2 years ago

Hey everybody, I'd like to help push this forward if possible. I had already done some work before reading and re-reading this discussion several times, and thought I would at least share what I had. It turned out to be pretty similar to the original #8086, but the implementation is a little different (doesn't change anything in _vendor, sorting happens at a different time/place, and I tried to add a few tests): https://github.com/davegaeddert/pip/pull/1/files

(I'm not at all familiar with the codebase, so I wouldn't be surprised if things aren't exactly where you would want them ultimately.)

I understand that the design decisions and implications are the harder and more important part, so I'll at least share my current opinion...

It feels simplest/best solution is to apply the logic to all dependencies (direct and indirect/transitive). At least as a starting point. It's predictable and I think has the desired outcome (even if that means failing, slow installs, etc.). Finding the issues is part of the point, at least for me.
Seems like the biggest concern with that is what happens when there is no lower bound specified (it could go way back in history and/or be likely to not work). There are two use cases as I understand it:
- Applications: they can address missing bounds by adding direct dependencies or constraints (advanced feature, but IMO this "prefer min" is a non-standard "advanced" feature too, at least at outset, so that can be a caveat/solution)
- Packages: fixes are harder, but they've at least learned that there are environments where users could have a broken set of dependencies which was the point. They can work with other packages to add lower bounds, use a different package / change their own dependencies, etc. I'd hope that this would be less of a problem over time if people had the tooling to test this, but without the option to test your lower bounds that might never happen (another Catch-22)...
- I could maybe see a warning/confirmation step listing where lower bounds are missing (thereby also telling you what can be fixed), but my initial reaction is that it shouldn't actually restrict you from running it.

I can do some more work and move my PR to this repo if any of this is helpful, just let me know. Totally understand if not! Sorry to lob in "yet another comment" on this, but I think it'd be an awesome feature to have available.

uranusjr commented 2 years ago

I’m wondering, would it be plausible if we require all dependencies to have a lower bound in the prefer-minimum mode, and emit a hard error if not? The situation is always fixable by the end user (i.e. the person running pip install) by adding an additional requirement that provides the lower bound, so it won’t prevent any use cases from being resolvable.

Starting with applying the strategy to all dependencies makes sense to me, but in which case we probably should design the switch to some sort of a named option instead of a --prefer-min boolean flag (say --strategy=prefer-min) so additional strategies can be added later.

davegaeddert commented 2 years ago

Thanks @uranusjr. The reason I didn't like the hard error was because in the "package author" use case, it sounded like there is some question as to what the best resolution is (i.e. is moving a transitive dep to a direct one w/ a lower bound the right fix?). What would you say is the fix for a library/package author that would be the "suggested" solution?

Depending on the answer to that, it seems like you still have to give them a way to actually run it and see if the missing bound is really a potential problem for downstream users (I can think of personal use cases where it wouldn't be an issue). Anyway, with that in mind I tried adding a prompt for a missing bound. This could be turned into a hard error if that's the route people want it to go: https://github.com/davegaeddert/pip/pull/1/files#diff-64990ce29c78c39f1164f70593aba9c8f39bdf7cf499e2749b09a6da5ce2901fR328-R337 CleanShot 2022-07-28 at 11 55 46 (the implementation of the prompt would need a couple more tweaks, but I'm not going to mess with it further if people don't like the idea)

I also went ahead and switched it to a string option. I used --strategy for now, but at least to me, "strategy" is a little vague especially when right next to "upgrade strategy". Made me wonder if something like "version selection" would be more useful if it's really intended just for that? So it could be --select-versions=min instead of --strategy=prefer-min?

Asday commented 2 years ago

If determining you need a higher minimum version for library is going to cause a problem for downstream users, they have a problem already, and they need to fix it. So long as you're correctly finding the lowest version you can support properly, that's your only responsibility.

pypa / pip