pypa / packaging-problems

An issue tracker for the problems in packaging
151 stars 35 forks source link

Add support for opportunistic dependencies #214

Open pganssle opened 5 years ago

pganssle commented 5 years ago

Recently the issue pypa/setuptools#1599 was raised, hoping that it would be possible to specify "soft-failing" extra dependencies. This is currently not possible, but I think it's reasonable and would help me in some of the dependency-resolution problems that have been worrying me in dateutil.

The original use case from @Harvie was for a package that works better with opencv, but can operate without it. Their main concern was that opencv is not available on ARM - it's possible that this can be handled via environment markers, but even if it works, that is essentially a hack around what they want to express, which is "if you can install this, you should, but if you can't, that doesn't need to block the installation of my package". Using environment markers, you are hard-coding that "arm doesn't need opencv" when in reality ARM would benefit just as much from opencv as anyone else, and if opencv were to release an ARM-compatible package, you'd want your users to pick that up.

I have a similar use case in dateutil - I would like to write a compiled backend, but dateutil is very widely used, and I am not confident that I can make releases for every platform. Ideally, I would declare an optional dependency on a backend so that people are opportunistically upgraded as their platform becomes available. Obviously there are workarounds in this case since I control both packages, but as with pypa/setuptools#1139, it would be much better if we had a way to explicitly specify the nature of the dependencies.

Possibly the easiest thing to do would be to implement this as an environment marker, maybe something like soft_dependency? e.g.:

opencv >= 1.2.0; soft_dependency

We may also need to get into the possibility of fallback dependencies, like:

open_cv >= 1.2.0; soft_dependency
cv_compat >= 2.1.0; substitutes_for=open_cv
njsmith commented 5 years ago

This would only solve the problem because opencv-python doesn't upload their sdists to PyPI, right? That's not something we want to encourage, is it?


Prior art note: Debian has Depends:, Recommends: and Suggests:. The differences are:

If X depends on Y, that's mandatory and enforced: if Y can't be installed then X can't either, and if Y is removed then X must be removed too.

If X recommends Y, then by default installing X will trigger the installation of Y, but you can toggle this with a config option, or remove Y afterwards.

If X suggests Y, then nothing happens by default, but I guess the data might be shown in package management UIs, like "if you liked X, check out Y" or something?

There's also Enhances:, which is identical to Suggests: except that the field lives in Y's metadata instead of X's.

Full details: https://www.debian.org/doc/debian-policy/ch-relationships.html#binary-dependencies-depends-recommends-suggests-enhances-pre-depends

pganssle commented 5 years ago

@njsmith Thanks for the prior art note, that's a very good taxonomy for it.

This would only solve the problem because opencv-python doesn't upload their sdists to PyPI, right?

That may be, but I think the use case is still valid. One can imagine a dependency that cannot be built on a given platform because it has no native support on that platform, in which case an sdist would do you no good.

One can also imagine that this could be useful in corporate or other locked-down environments where it's not possible to use certain packages for licensing reasons or because they have not yet gone through compliance. In that event, you could safely block the package in a caching proxy and anything with a "recommends" dependency would be satisfied to use the default case of "not included".

Another possible use case (and one I haven't really thought through yet) would be in resolving cycles or package conflicts. You could, for example, do something like this:

somepackage; recommends>2.0
somepackage>1.0

That would check if anything requires <=2.0, and if not, it would opportunistically upgrade somepackage to the latest version. This allows you to express that your software is using some feature of the later versions, but that there's a workaround in place to avoid over-tight version pinning.

pfmoore commented 5 years ago

Agreed, I'm not happy with encouraging binary-only uploads. Maybe an option would be a variant of the --only-binary flag that said "Only install binaries for this requirement, but ignore the requirement if it's not available as a binary. So maybe pip install opencv --optional-binary opencv?

That would put the decision in the user's hands rather than the packager's, which seems more reasonable to me.

If the package wants to only have binaries for certain platforms, they could always upload a pure-python "dummy" wheel that would be installed when there's no platform-specific binary. That dummy wheel could simply include a flag mypkg.accelerators_enabled = False, which the calling code could test for. It's a workaround, sure, but it's a potentially viable option.

pganssle commented 5 years ago

One thing to note is that right now the use cases are all or mostly about resolving "illusory conflicts" that are created by packages being forced to express stronger requirements than they actually have.

I think by allowing packages to express dependencies in a more nuanced way, you could also allow consumers to more easily express their preferences with regards to dependency installation. For example, if we adopted both Recommends and Suggests, you could expose that to consumers by adding --no-recommended-deps and --suggested-deps flags to pip. The first one would install the bare minimum set of required dependencies, the latter would install all optional dependencies. The default would be that you get dependencies and recommended dependencies, and you can either opt out of recommended or opt in to suggested dependencies.

I think we probably need to spend some time thinking about how much complexity we actually want to expose into the dependency resolution system, but I know I've been chafing at the inability to express the various fallback mechanisms I've designed in my packages that would allow people to easily opt for a different balance of features to "installation weight", as it were.

pganssle commented 5 years ago

Agreed, I'm not happy with encouraging binary-only uploads.

I think the problem of binary compatibility is a bit of a red herring. There are other reasons why you may have incompatibilities - for example, one of your "recommended" dependencies may be slow to add support for Python 3.7, or may only be available on Python 2 or 3 and you are supporting both. This would free you up to forge ahead and let your recommended dependencies upgrade at their own pace.

That would put the decision in the user's hands rather than the packager's, which seems more reasonable to me.

Adding Recommends metadata is fundamentally the job of the packager, because they are the ones who know the difference between dependencies that are required and ones that merely enhance the experience of using the software. dateutil is essentially useless without six, but I am planning on spinning off the dateutil.zoneinfo module into its own package because dateutil basically works just fine without dateutil.zoneinfo. I would probably make dateutil.zoneinfo a Recommends dependency in this new scheme - it's installed by default but not essential.

pfmoore commented 5 years ago

I think the problem of binary compatibility is a bit of a red herring.

It is and it isn't. If all you're suggesting is that an installer try building from sdist and ignores any error in the build, then that's probably OK (I say "probably" because there are all sorts of caveats over the practicalities of cleaning up after a failed build that would need reviewing and possibly addressing). Also, I'm not convinced that having a load of build errors, then a successful install, is a particularly nice UX (nor is hiding the build errors - what if the errors were unexpected and the user thought the dependency would install?)

But you then go on to say "dependencies may be slow to add support for Python 3.7, or may only be available on Python 2 or 3", and I don't know how you expect that to work in practice (given that you're saying that not uploading sdists is not the mechanism you're intending). So your use case is a bit muddled here.

Their main concern was that opencv is not available on ARM

I'm not sure this is Python related. From a quick Google search it seems that opencv is a C library (with Python bindings)? So it's not possible to express a dependency on opencv in package metadata. Again, it's not entirely clear how what you're proposing would work in practice.

I have a similar use case in dateutil - I would like to write a compiled backend

That's definitely a case where I'd expect to have a universal backend that basically does nothing, and platform-specific backends that have the speedup code (if only because of the same problem of "we don't want to encourage binary-only packages"). The core dateutil code then checks the actually installed backend module to see if it's the dummy or not before calling the speedups.

Adding Recommends metadata is fundamentally the job of the packager, because they are the ones who know the difference between dependencies that are required and ones that merely enhance the experience of using the software

Point taken - although what I assume you mean by "Recommends data" feels somewhat different from what you were proposing originally. Maybe I'm misunderstanding, though, and the two cases are more similar than I imagine.

As far as "recommends" metadata is concerned, AIUI that's typically done using extras right now ("install foo[fancy_graphics] if you want extra graphical capabilities"). IMO that gives the user more control than a "recommends" type of dependency, but at the cost of having to discover the available extras, and having extra choices to make. And "recommends" dependencies are installed (if possible) by default rather than being opt-in, which I agree is good in some cases.

Overall, I'm not against the idea here in principle, but I think it needs to be a lot more clearly specified before it's viable as a proposal.

pganssle commented 5 years ago

But you then go on to say "dependencies may be slow to add support for Python 3.7, or may only be available on Python 2 or 3", and I don't know how you expect that to work in practice (given that you're saying that not uploading sdists is not the mechanism you're intending). So your use case is a bit muddled here.

The most obvious mechanism is a package that has python_requires="<3.0" or python_requires=">=3.0". If this is a "nice to have" dependency, it should not preclude Python 2/3 support. I think it's unlikely that many packages will put an upper limit on their Python version support, but we need to figure out a way to retroactively fix metadata versions anyway - that's a separate issue. Still, there are many things that will not work in Python 3.7 because they use async somewhere in their library. Ideally this would fail at install time in some way that would be detected by the "recommended" resolver, but for now I think it's very much reasonable for a package to distinguish between what is and is not actually required.

I'm not sure this is Python related. From a quick Google search it seems that opencv is a C library (with Python bindings)?

I recommend ignoring the specific opencv example if you find it confusing and instead attempt to construct a steel man argument. I am not interested in debating or trying to figure out the specifics of anything related to OpenCV dependencies. Just imagine any situation where an upstream package has actual support for platforms overlapping (but not completely) with your user base - how do you tell pip that it should install this thing if possible?

Point taken - although what I assume you mean by "Recommends data" feels somewhat different from what you were proposing originally. Maybe I'm misunderstanding, though, and the two cases are more similar than I imagine.

I cannot read your mind, but the core concept of "recommends" has not changed. It is a set of dependencies that a package uses and would like installed by default, but that is not necessarily required. It essentially means "Please install this, but if for any reason you can't, that's fine, using this library without such and such a dependency is still mostly supported".

As far as "recommends" metadata is concerned, AIUI that's typically done using extras right now ("install foo[fancy_graphics] if you want extra graphical capabilities"). IMO that gives the user more control than a "recommends" type of dependency, but at the cost of having to discover the available extras, and having extra choices to make. And "recommends" dependencies are installed (if possible) by default rather than being opt-in, which I agree is good in some cases.

This does not solve the problem, because both this and pypa/setuptools#1139 are slightly different flavors of "opt out" dependencies. Having the ability to opt out of un-required dependencies is in no way less control than having the ability to opt in to them, and in fact it will lead people to just make everything a hard dependency rather than bother with extras.

I envisioned that people would do things like this:

...
install_requires=[
    'somepkg; recommends
],
extras_require={
    'somepkg': ['somepkg'],
}

Such that mypkg has somepkg as a default-on dependency, but mypkg[somepkg] has somepkg as a hard-dependency. This way downstream consumers of mypkg can declare that they need the features of mypkg enabled by somepkg. Depending on whether we ever get something like "negative dependencies", you could also have a no-somepkg extra, to specify that you explicitly do not need the features enabled by somepkg.

My general principle when designing interfaces is to have the default be the thing most people want, but you should provide "escape hatches" for people who may want something different. That is defeated by not having a way to express the fact that some dependency of my package is something that most people will want, but some people may not want that, and not having it is a supported workflow. Right now the only options are that dependencies can be required or they can be opt-in. I want some dependencies to be opt-out.

Overall, I'm not against the idea here in principle, but I think it needs to be a lot more clearly specified before it's viable as a proposal.

Which is generally what issues like this are for. I hope that I have established that this is a real kind of dependency that we currently have no way to declare, and we can now move on to designing what support for such a thing would look like.

The question of how best to do dependency resolution without the dependency resolution syntax becoming a full-fledged language with a package manager of its own is a tricky one. It is likely that we cannot consider this proposal in isolation and we may need a meeting to discuss it, or a small working group that designs a proposal. I suspect that it will be hard to design using only GH issues and/or mailing lists. I was hoping to start documenting the various dependency-specification related issues that come up here and elsewhere (again, see pypa/setuptools#1139 as one example), so that we have the data needed to come up with a solution that takes into account the various problems that have been cropping up.

njsmith commented 5 years ago

@pfmoore This is the opencv package that we're talking about on pypi, that has only wheels and no sdist: https://pypi.org/project/opencv-python/

@pganssle Allowing packages to describe different profiles of features vs. installation weight is exactly what extras do, right? We describe different options just fine; the problem is that pip always defaults to installing the most pared-down profile, which isn't a great default.

So the question is exactly, under what circumstances should pip install these not-exactly-required packages by default, and how do we let users and packagers control that?

The simplest mechanism would be like Debian, where packagers can say "install this by default", but the end-user can override it, and apt can automatically ignore it if necessary, but like @pfmoore says, it's really not clear how pip should handle sdists in that case. Downloading and trying to build a huge package that will always fail is not a great user experience. (The user knows it will fail, the packager knows it will fail, it just looks like pip is wasting their time for no reason, everyone concludes that the packaging maintainers are idiots.)

You could have a "binary-only dependency" ("only install this if you can find a wheel"), but that feels really weird to me -- normally we try to decouple "what we need" from "how we get it".

I have a similar use case in dateutil - I would like to write a compiled backend, but dateutil is very widely used, and I am not confident that I can make releases for every platform. Ideally, I would declare an optional dependency on a backend so that people are opportunistically upgraded as their platform becomes available.

This is a pretty common situation, yeah. The way people normally handle it currently is that they include the backend code inside their regular sdist, and when setup.py is run they check whether they can build the backend or not. So on platforms where you uploaded a wheel, everyone gets the precompiled code, and if pip has to fall back on an sdist, then it automatically DTRT. Which doesn't mean we can't do better, just putting that out there.

I am planning on spinning off the dateutil.zoneinfo module into its own package because dateutil basically works just fine without dateutil.zoneinfo. I would probably make dateutil.zoneinfo a Recommends dependency in this new scheme - it's installed by default but not essential.

Isn't this the case where people usually just... have two packages? If someone needs timezone functionality, they depend on the package that provides timezone functionality, if they don't, they don't?

There is a challenge in splitting a package in two without breaking users -- that's not something we have great tools for right now. The usual way Debian handles something like this is to create two new packages (dateutil-core and dateutil-zoneinfo, say), and then make 'dateutil' into a trivial package that just depends on those two new packages. I guess that's something we already support? I'm not sure if there's any way to do better.

I recommend ignoring the specific opencv example if you find it confusing and instead attempt to construct a steel man argument. I am not interested in debating or trying to figure out the specifics of anything related to OpenCV dependencies. Just imagine any situation where an upstream package has actual support for platforms overlapping (but not completely) with your user base - how do you tell pip that it should install this thing if possible?

The problem is, packaging is extremely complex and has endless edge cases and unusual situations. We can't make decisions on the basis of "can we imagine a situation where this might be useful". If we want our system to be useful to actual people in common situations, we need to actually look at those cases, to make sure that what we add solves real problems.

pganssle commented 5 years ago

The problem is, packaging is extremely complex and has endless edge cases and unusual situations. We can't make decisions on the basis of "can we imagine a situation where this might be useful". If we want our system to be useful to actual people in common situations, we need to actually look at those cases, to make sure that what we add solves real problems.

Most of these problems do not come up because responsible maintainers come up with weird workarounds. In the original thread I provided 4 separate workarounds, and did not suggest that the original reporter somehow attempt to convince OpenCV to release an sdist. Again, using the Steel Man principle would be helpful here - take it as a given that this is a problem and start looking for other things that this would solve. Here's one possible example reported at setuptools just 2 hours ago.

And in general these issues are all symptoms of a larger problem, which is that dependency declarations are not sufficiently expressive. Here's another example: What if you want to depend on Tensorflow but want the ability to fall back to pytorch or some alternative library if you need Python 3.7 support or something of that nature. Sure we "don't want to encourage people to only post sdists", but people are not going to stop using Tensorflow and Google is not going to play nice with the rest of the world because we make it hard for people to declare their dependencies on Tensorflow correctly. What is much more likely to happen is that people will not provide wheels, or do some other nonsense because dependency declarations are needlessly restrictive and they can always just write a setup.py that does a bunch of runtime dependency checking.

Isn't this the case where people usually just... have two packages? If someone needs timezone functionality, they depend on the package that provides timezone functionality, if they don't, they don't?

No, this is something different entirely but the specifics don't matter. It is very much an opportunistic dependency because the zoneinfo package doesn't really provide any time zone functionality and in fact requires the dateutil package, but the dateutil package can provide nearly everything that the zoneinfo package provides on most systems.

Ideally I would have something like this:

pip install python-dateutil    # installs dateutil.zoneinfo only on Windows
pip install python-dateutil[zoneinfo] # requires zoneinfo on all platforms
pip install python-dateutil[no-zoneinfo]  # Does not require zoneinfo

I'm not even entirely sure that Recommends metadata would fix my problem, but combined with a satisfactory resolution to pypa/setuptools#1139, it would get me very close.

pganssle commented 5 years ago

I am not interested in further justifying the use of this functionality at this point. Anyone else can feel free to try to make the case for it if there are further objections. I think it is very obvious that there is a problem here, and we all know what at least one of the problems is - it is not possible for packages to declare that some dependency is not required but it should be installed by default. If we can agree on that, then we can start to focus on the solution.

The major realistic problems I see are:

  1. We do not have sufficiently robust metadata about what is and is not supported on what platforms. The de facto standard has been "if you can't install it from an sdist, it's not supported", but "try to install the sdist and ignore errors" may not exactly get us the behavior we want (plus it will be needlessly wasteful).
  2. How to handle the unfortunately all-too-common situation where installation from an sdist will indeed succeed on a newly-released version of Python, but it shouldn't because actually using the unsupported package would fail. We may need to resolve this by making PyPI metadata mutable in at least some limited way.
  3. How to handle "fallback dependencies", where any of A, B or C is required. Similarly, how about the situation where (A, (B, C) or D) is required.

There are probably several other "big questions" to be answered, and we definitely need to consider the other dependency-declaration problems people have as part of a more general solution.

There are other "small questions" to be addressed as well, like how exactly this information gets encoded - is it a new syntax? Is it one or more environment markers? I would probably also classify some of the "how will pip behave" questions as "small" in the sense that you just have to pick a behavior and run with it.

ncoghlan commented 5 years ago

Weak dependencies are definitely useful. RPM-based distros adopted a model similar to the Debian one a few years back: http://rpm.org/user_doc/dependencies.html (see the "Weak dependencies" section at the end)

However, they also pretty much require a proper resolver to handle, since they create many more potential sets of compatible packages given an initial package listing (if you try to install an optional dependency and one if its mandatory dependencies is unavailable or otherwise fails to install, you need a resolver that's clever enough to back out and break off that entire sub-branch). It isn't a coincidence that Fedora et al didn't get weak dependencies until after the original yum implementation was superseded by dnf (which is backed by libsolv).

These kinds of "use if available" links also introduce additional complexity into version pinning, since you need to decide how to handle them at both pinning time (pip-compile, pipenv lock, etc), and at installation time (pip-sync, pipenv sync, etc). (The simplest option is to ignore them at installation time, and instead rely solely on whether or not they were resolved at pinning time, but even that option involves documenting that that's the intended behaviour, and making sure that tools actually end up working that way)

Harvie commented 5 years ago

@ncoghlan what if you just call "pip install" for each weak dependency after (or before) installing the primary package? if it fails it fails. if it installs it installs. point is that the main manualy selected package is always installed. i see no problem...

ahartikainen commented 5 years ago

This would be useful thing for many scientific packages that have functionality supported by external (C/C++/Fortran with Python wrappers) library but which is not the core of the package. This would also enable them to be installed on platforms that can not install the binary dependencies.

brainwane commented 5 years ago

If I understand correctly, this issue may depend on https://github.com/pypa/pip/issues/988 the new pip resolver.

The Python Software Foundation's Packaging Working Group has secured funding to help finish the new dependency resolver, and is seeking two contract developers to aid the existing maintainers for several months. Folks in this thread: Please take a look at the request for proposals and, if you're interested, apply by 22 November 2019. And please spread the word to freelance developers and consulting firms.

adigitoleo commented 3 years ago

I just opened #432 but I now see that this thread is discussing very similar ideas. Let me know if that issue should be closed in favour of this one.