Open barrh opened 5 years ago
Good question. I think that with conda everything should work out fine, given that we have separated out the CUDA version into a different package.
I don't know how to solve the pip
issue though. @soumith do you have an idea?
@barrh can you say what EXACT commands you are using in each case. It's not clear to me what you have tried, so getting clarity on that will help me understand what's going on.
@soumith The following installation scenarios will fail to run on cuda10 system:
1.
pip install https://download.pytorch.org/whl/cu100/torchvision-0.3.0-cp36-cp36m-linux_x86_64.whl; pip install https://download.pytorch.org/whl/cu100/torch-1.1.0-cp36-cp36m-linux_x86_64.whl
2.
pip install https://download.pytorch.org/whl/cu100/torch-1.1.0-cp36-cp36m-linux_x86_64.whl torchvision; pip install --force https://download.pytorch.org/whl/cu100/torchvision-0.3.0-cp36-cp36m-linux_x86_64.whl
Edit: 2nd example.
hmmm, yea there isn't a easy pip fix to it, but there is a way to fix this. I opened an issue on PyTorch to address it, but it will take a bit of effort from my side https://github.com/pytorch/pytorch/issues/19990
I ran into this issue over the weekend. It does have some consequences, as it causes some of the example PyTorch models (like the fast style transfer code sample) to fail to train. Unfortunately, I don't have the exact error message at hand, but it was a runtime error related to the CUBLAS library. It triggers on the features.bmm call in the gram_matrix function of the utils.py file (line 25).
However, I did find a temporary workaround. The official PyTorch site says to install torch first, then torchvision. If you reverse the install commands, then the incorrect version of torch will be installed with torchvision. Running the torch install command then overwrites the wrong torch version with the correct one. I haven't tested it thoroughly, but it does cause torch.version.cuda to report the correct version number (10.0.130) and the fast style transfer code starts to train.
I just ran into (mostly) the same problem, although the versions are different since it's been three years . When I used the command given in pytorch's instructions for Linux (Ubuntu 22.10), Stable (1.12.1), and CUDA 11.6:
pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116
it installed the right version of torchvision
(0.31.1+cu116) to get CUDA 11.6 but the older CUDA version for torch
(1.12.1+cu102). @tyrian411's solution of re-installing torch by itself fixed it. Perhaps the documentation should be annotated in case people run into this again.
torchvision and pytorch are required to be built in similar environment. Currently, when calling pip-install on cuda10 build of torchvision, (if torch is not already installed) it implicitly installs the default Pytorch build, which - as of today - is built against cuda9, resulting in raising an error during run-time.
e.g.
pip install torchvision[cuda10_package] torch[cuda10_package]
- will work but:pip install torchvision[cuda10_package]; pip install torch[cuda10_package]
or similarily,pip install torch[cuda10_package] torchvision; pip install --force torchvision[cuda10_package]
will not. (order matters)Can the requirements of each build be specified in such a way that pip will automatically look for the matching build of pytorch? Alternatively, is it possible for pip to independently decide the correct build based on environment variables?