pytorch / vision

Datasets, Transforms and Models specific to Computer Vision
https://pytorch.org/vision
BSD 3-Clause "New" or "Revised" License
16.08k stars 6.94k forks source link

[mamba] Installing cpuonly PyTorch and torchvision from conda channel should not require cudatoolkit #5458

Closed motjuste closed 2 years ago

motjuste commented 2 years ago

🐛 Describe the bug

When installing CPU-only version of PyTorch 1.10 using the official guidelines from the website for using conda as package manager, the package cudatoolkit is also installed. The cudatoolkit package is very large to download (1 GB) and even larger after getting installed. The large size is concern especially in building Docker images.

Basic reproduction steps:

  1. Use conda for a dry-run install of cpuonly pytorch=1.10 and torchvision.
  2. Check in the output whether cudatooklit would have been installed or not.

The script below saves the dry-run's output to a file and uses grep to check if cudatoolkit was indeed identified as a dependency.

# assuming conda is installed
conda create -n dry python=3.8
conda install -q -n dry \
    -c pytorch -c defaults -c conda-forge \
    cpuonly pytorch=1.10 torchvision \
    --dry-run > dry-run.txt
grep cudatoolkit dry-run.txt

Versions

PyTorch version: N/A
Is debug build: N/A
CUDA used to build PyTorch: N/A
ROCM used to build PyTorch: N/A

OS: Microsoft Windows 10 Enterprise
GCC version: Could not collect
Clang version: 13.0.1
CMake version: version 3.22.2
Libc version: N/A

Python version: 3.8.12 (default, Oct 12 2021, 03:01:40) [MSC v.1916 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.19042-SP0
Is CUDA available: N/A
CUDA runtime version: 11.2.67
GPU models and configuration: GPU 0: NVIDIA GeForce 940MX
Nvidia driver version: 472.88
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] Could not collect
[conda] Could not collect
IvanYashchuk commented 2 years ago

I have just tried to run the specified steps and the problem is not reproduced on my end. Listing packages installed from the pytorch channel also shows that the correct CPU variants were installed (inspecting the content of dry-run.txt):

The following NEW packages will be INSTALLED:

  cpuonly            pytorch/noarch::cpuonly-2.0-0
  ffmpeg             pytorch/linux-64::ffmpeg-4.3-hf484d3e_0
  pytorch            pytorch/linux-64::pytorch-1.10.2-py3.8_cpu_0
  pytorch-mutex      pytorch/noarch::pytorch-mutex-1.0-cpu
  torchvision        pytorch/linux-64::torchvision-0.11.3-py38_cpu
IvanYashchuk commented 2 years ago

I do see the problem when trying to install with mamba instead of conda:

mamba create -n test -c pytorch torchvision cpuonly
pytorch                1.10.2  py3.9_cpu_0     pytorch/linux-64        86 MB
pytorch-mutex             1.0  cpu             pytorch/noarch           3 KB
torchvision            0.11.3  py39_cu113      pytorch/linux-64         9 MB

PyTorch variant is correctly cpu, but mamba picks up the cuda variant of torchvision. Even though the recipe specifies the constraint on pytorch-mutex: https://github.com/pytorch/vision/blob/c6b447b740b9153bab185b2da39bd321c5b619b1/packaging/torchvision/meta.yaml#L32 packages that are uploaded to the pytorch channel do not have this dependency. image

pmeier commented 2 years ago

cc @malfet

motjuste commented 2 years ago

That's a great catch. I had forgotten that I had aliased conda to mean mamba on my machine, and I do see that the problem does not exist when I use the real conda.

Nevertheless, I stumbled upon something by mistake.

For some background, one can set channel_priority as strict in the Conda configuration: by running conda config --set channel_priority strict; check by running conda config --show channel_priority. Other (the default) priority option is flexible.

On Linux, when the channel priority is strict, the command below actually fails saying:

package 'pytorch-1.10.2-cuda112py39h4de5995_1' is excluded by strict repo priority

# prioritize the `defaults` channel over the `pytorch` channel
mamba -c defaults -c pytorch \
    install -q -n test --dry-run cpuonly pytorch=1.10 torchvision

# Linux   + strict   = FAIL
# Linux   + flexible = installs cudatoolkit
# Windows + strict   = installs cudatoolkit
# Windows + flexible = installs cudatoolkit

Interestingly, conda also shows interesting behaviour with strict channel priority:

# prioritize the `defaults` channel over the `pytorch` channel
conda -c defaults -c pytorch \
    install -q -n test --dry-run cpuonly pytorch=1.10 torchvision

# Linux   + strict   = FAIL
# Linux   + flexible = installs cpu version
# Windows + strict   = FAIL
# Windows + flexible = installs cpu version

I know that the command in the documentation prioritizes pytorch channel over defaults, but I hope this observation gives you some ideas.

The following two mamba commands always install cudatoolkit, on both Windows and Linux, irrespective of channel priority. Whereas the conda versions of the two commands do not install cudatoolkit in either scenario.

mamba -c pytorch -c defaults \
    install -q -n test --dry-run cpuonly pytorch=1.10 torchvision

mamba -c pytorch \
    install -q -n test --dry-run cpuonly pytorch=1.10 torchvision
seemethere commented 2 years ago

Removing this from the Dev Infra backlog since this is related to mamba and mamba is not an officially supported package manager for PyTorch projects as of now

rgommers commented 2 years ago

This seems to have been fixed in the 0.12.0 release made 8 days ago. All conda packages now have a dependency on pytorch-mutex 1.0 (click the (i) icon on https://anaconda.org/pytorch/torchvision/files to verify):

image

@motjuste does it work for you now?

motjuste commented 2 years ago

Yes, with torchvision 0.12.0, it works as expected --- no cudatoolkit is installed, and only the cpu versions are installed when asked for cpuonly. Unfortunately, one does have to update to pytorch 1.11.0, but that does not seem to be a breaking change for me at least. Thanks a lot for your work!