pytorch / vision

Datasets, Transforms and Models specific to Computer Vision
https://pytorch.org/vision
BSD 3-Clause "New" or "Revised" License
16.08k stars 6.94k forks source link

Cannot install torchvision 0.8.2 with conda-forge without defaults channel #3264

Closed zalza closed 2 years ago

zalza commented 3 years ago

🐛 Bug

I'm using only conda-forge channel by installing Miniforge/Mambaforge, because of the recent change of Anaconda TOS, which prohibits commercial use of defaults channel for a big company with employees >= 200. With conda-forge only, installing torchvision=0.8.2 fails because it requires jpeg<=9b, which does not exist in conda-forge channel. This requirement was updated from commit 2f40a483d73018ae6e1488a484c5927f2b309969 in release/0.8.0 branch.

To Reproduce

The OS I'm using is Ubuntu 20.04.1. $ conda create -n test torchvision=0.8.2 -c pytorch

Collecting package metadata (current_repodata.json): done
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: |
Found conflicts! Looking for incompatible packages.
This can take several minutes.  Press CTRL-C to abort.
failed

UnsatisfiableError: The following specifications were found to be incompatible with your system:

  - feature:/linux-64::__glibc==2.31=0
  - torchvision=0.8.2 -> cudatoolkit[version='>=11.0,<11.1'] -> __glibc[version='>=2.17,<3.0.a0']

Your installed version is: 2.31

The error message from conda does not show the actual problem. mamba from Mambaforge reveals the conflict more correctly: $ mamba create -n test python=3.8 torchvision=0.8.2 pillow=8 -c pytorch

Looking for: ['python=3.8', 'torchvision=0.8.2', 'pillow=8']

conda-forge/linux-64     Using cache
conda-forge/noarch       Using cache
pytorch/noarch           [====================] (00m:00s) No change
pytorch/linux-64         [====================] (00m:00s) No change
Encountered problems while solving.
Problem: package torchvision-0.8.2-py38_cu110 requires jpeg <=9b, but none of the providers can be installed

We can see that jpeg<=9b does not exist in conda-forge channel. $ conda search jpeg -c defaults

Loading channels: done
# Name                       Version           Build  Channel
jpeg                              8d      h516909a_0  conda-forge
jpeg                              9b      h024ee3a_2  pkgs/main
jpeg                              9b      h376031c_1  pkgs/main
jpeg                              9b      habf39ab_1  pkgs/main
jpeg                              9c   h14c3975_1001  conda-forge
jpeg                              9d      h36c2ea0_0  conda-forge
jpeg                              9d      h516909a_0  conda-forge

Additional context

Issue #3207 occurs from the same reason.

cc @seemethere

fmassa commented 3 years ago

If I understand this correctly, the issue is with jpeg

@andfoy can you have a look?

h-vetinari commented 3 years ago

Hey all

The conda-forge support for torchvision is determined by the state of the so-called feedstock, where the packages are built.

It hasn't been updated in a long time, and I actually started an attempt a few days ago to do so: https://github.com/conda-forge/torchvision-feedstock/pull/12

I actually came to the issue tracker here to figure out a flaky but persistent test-failure that I have on python 3.7 (also appeared on 3.8)

=================================== FAILURES ===================================
__________________ ModelTester.test_maskrcnn_resnet50_fpn_cpu __________________

self = <test_models.ModelTester testMethod=test_maskrcnn_resnet50_fpn_cpu>
model_name = 'maskrcnn_resnet50_fpn', dev = 'cpu'

    def do_test(self, model_name=model_name, dev=dev):
>       self._test_detection_model(model_name, dev)

test/test_models.py:410: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
test/test_models.py:189: in _test_detection_model
    self.assertEqual(scripted_out[0]["boxes"], out[0]["boxes"])
test/common_utils.py:244: in assertEqual
    assertTensorsEqual(x, y)
test/common_utils.py:214: in assertTensorsEqual
    self.assertLessEqual(max_err, tolerance, message)
E   AssertionError: tensor(63.0367, grad_fn=<MaxBackward1>) not less than or equal to tensor(0.0030, grad_fn=<AddBackward0>) :

There's more work to be done even after that PR (see some points in the OP), but once that gets unblocked/merged, I'll be able to move more quickly.

By the way, if any of the maintainers here want to join the curation of the conda-forge recipe, that would be most welcome.

h-vetinari commented 3 years ago

PS. I realize that the pytorch channel has its own recipe. However, there are other packages in conda-forge that want to depend on torchvision (e.g. allennlp 2.0.0), and conda-forge effectively only allows depending on packages within conda-forge.

In that way, fixing the problems in the current conda-forge recipe could then also help the pytorch channel break the dependency on main.

bw4sz commented 3 years ago

Bump here. @h-vetinari What's the desired way to add a package conda-forge that depends on torch and torchvision, its not obvious that we can add packages from the official pytorch channel into the conda-forge meta.yml feedstock. There seems to be some anger here: https://twitter.com/jeremyphoward/status/1178351261608861701?lang=en, but its not clear there is an actual solution. Is conda-forge basically dead to projects using torch? I am the developer here:

https://deepforest.readthedocs.io/

farleylai commented 3 years ago

To sum up, this issue unfortunately involves multiple parties to solve. If torchvision does not follow or respect dependencies exposed from conda-forge soon, similar dependency conflicts are likely worse over time. Otherwise, it is the other way around for conda-forge to rebuild most of its packages to depend on jpeg<=9b but it seems even more far-fetched.

h-vetinari commented 3 years ago

Bump here. @h-vetinari What's the desired way to add a package conda-forge that depends on torch and torchvision, its not obvious that we can add packages from the official pytorch channel into the conda-forge meta.yml feedstock.

It's pretty simple, submit a pull-request to https://github.com/conda-forge/staged-recipes that adds the package (& debug eventual issues in CI there), and simply specify pytorch & torchvision as dependencies.

You're right though that packages in conda-forge cannot depend on any other channel - this is because conda-forge has no control over the used compilers and other ABI-relevant pieces that are a key component of reliable packaging. This is a hard restriction.

The key problem in this space is that pytorch is such a heavy build that it times out with the standard agents that Azure provides (6h). That makes the pytorch-recipe very hard to evolve, because it basically depends on a core-member to do things manually & locally, rather than the usual everyone-can-contribute-a-PR scheme. If this bottleneck were to go away (which is in the works...), there'd be essentially no issues in keeping pytorch builds current & across all platforms (currently missing builds for pytorch 1.9 & windows builds generally for example)

There seems to be some anger here: https://twitter.com/jeremyphoward/status/1178351261608861701?lang=en, but its not clear there is an actual solution. Is conda-forge basically dead to projects using torch? I am the developer here:

https://deepforest.readthedocs.io/

That rant is heavily out of date. I helped reactivate that feedstock, and the packages are pretty current and have GPU support (well, 0.10.0 seems to depend on the as-yet-unbuilt pytorch 1.9, so I'm waiting for that).

Conda-forge is not dead for pytorch packages, in fact more and more is migrating there (from my POV). As soon as there's a dedicated build queue that allows building packages like pytorch, there should be no fundamental issues anymore. Of course, if you're affected by this situation in a commercial context, it would be very helpful to support that effort in conda-forge. The linked tweet is quite an outlier IMO, as Nvidia and others are heavily betting on conda-forge for the distribution of much of their GPU packages. Despite being volunteer-driven and with basically no funding (and some corresponding limitations like CI timeouts), it is the gold standard in python packaging by a longshot (personal opinion), and this is being recognised & reflected more and more.

h-vetinari commented 3 years ago

In that sense, conda-forge feedstocks should have pushed most of its updates to the default channel through the official feedstocks.

Conda-forge does not control anaconda channels - Anaconda does... The easiest solution might be just having pytorch and conda-forge in your channels, without defaults. Generally, conda-forge even recommends conda config --set channel_priority strict, which means that if a package is available in several channels, it will always be taken from the highest priority one, even if a newer package is available somewhere else (this again has to do with ensuring ABI-consistency, etc.)

farleylai commented 3 years ago

It is not meant to have control over the official anaconda channels but a systematic way to notify the default channels of well accepted community updates given the social responsibility as likely the largest community driven channel. Otherwise, it could remain as a concern about unexpected complexities to depend on a non-official channel (https://github.com/pytorch/vision/issues/2291). On the other hand, while specifying the strict channel priority may work, it is indeed a headache for end users to sort out the implicit dependencies across channels if not from the default one.

h-vetinari commented 3 years ago

[...] but a systematic way to notify the default channels of well accepted community updates given the social responsibility as likely the largest community driven channel

I don't know the internals of the Anaconda setup, but I'd be very surprised if this is not (at least partially) automated - they have all the pieces at hand, not least: they're sponsoring all the hosting & petabytes of traffic. The bottleneck is likely that Anaconda does its own curation & signing and so there will need to be human sign-off for any package change, which isn't a picknick across thousands of packages.

I would be careful with words like "social responsibility", the integration work & infrastructure necessary to do what [ana]conda[-forge] does is massive, and effectively provided for free to the community at large. You're most welcome to contribute to the packaging effort on the conda-forge side and help improve this experience.

farleylai commented 3 years ago

It is a blast and carefree if everyone solely works on its own channel for dependencies and distributions. However, a repo channel is not meant to create new packages but aims to bring packages potentially created from outside to host. This inevitably involves social interaction with other communities to maximize its impact and is where the responsibility comes from. If possible, minimal modification should be preferred to redistribute a package as its official build. nvidia cuda packages tend to be redistributed as is through conda-forge, is it really necessary to rebuild pytorch/torchvision/etc. from scratch instead of similar repackaging?

h-vetinari commented 3 years ago

is it really necessary to rebuild pytorch/torchvision/etc. from scratch instead of similar repackaging?

For the most part, anaconda will treat conda-forge as the "upstream" packaging recipe, but pytorch et al. have historically been separate (due to the difficulties of building it). I imagine there'd be some non-trivial work on their side to validate & adopt the conda-forge builds, which most likely has so far just been a question of not being a high enough priority. Perhaps an exploratory ping might help for awareness, at least 🙃 @jezdez @chenghlee

Masterxilo commented 3 years ago

I am supposed to set up a docker container containing some software that manages its environment via conda and I lost a day trying to figure out how to debug such a conflicts/"UnsatisfiableError: The following specifications were found to be incompatible with your system:" error...

I switched to using mamba instead of conda for installing everything (except for installing mamba itself) and it just works.

Go figure...

fmassa commented 3 years ago

Bumping the priority of this issue so that we can get it fixed soon

h-vetinari commented 3 years ago

Can you retry the installation? There have been a lot of new torchvision builds on conda-forge recently, none of which should require any packages from the default channel