Open atalman opened 7 months ago
We will do that soon. Tracked internally at b/323302699 .
Hello @mayankmalik-colab we just released version 2.2.1: pypi.org/project/torch/2.2.1/#files Please use this instead of 2.2.0 version
Hello @atalman , I was trying to use torch-2.2.1 wheel but it installs Cuda dependencies as well, which is not the case with the current torch-2.1.0 wheel we use. It is important that we don't install Cuda dependecies as those mess up with other frameworks like Jax. I want to know if you can point me to torch-2.2.1 wheels that don't install cuda dependencies?
Example:
torch-2.1.0 wheel
!pip show torch
Name: torch
Version: 2.1.0+cu121
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: [packages@pytorch.org](mailto:packages@pytorch.org)
License: BSD-3
Location: /usr/local/lib/python3.10/dist-packages
Requires: filelock, fsspec, jinja2, networkx, sympy, triton, typing-extensions
Required-by: fastai, torchaudio, torchdata, torchtext, torchvision
torch-2.2.1 wheel
!pip show torch
Name: torch
Version: 2.2.1+cu121
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: [packages@pytorch.org](mailto:packages@pytorch.org)
License: BSD-3
Location: /usr/local/lib/python3.10/dist-packages
Requires: filelock, fsspec, jinja2, networkx, nvidia-cublas-cu12, nvidia-cuda-cupti-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-runtime-cu12, nvidia-cudnn-cu12, nvidia-cufft-cu12, nvidia-curand-cu12, nvidia-cusolver-cu12, nvidia-cusparse-cu12, nvidia-nccl-cu12, nvidia-nvtx-cu12, sympy, triton, typing-extensions
Required-by: fastai, torchaudio, torchdata, torchtext, torchvision
From what I see, we switch from the big wheel model in 2.1.0 where all the CUDA dependencies are bundled inside PyTorch to the small wheel model in 2.2.x where CUDA dependencies come from PyPI (if you are using PIP). You can see the size of 2.1.0 cu121 wheel is 2GB+ while 2.2.1 cu121 is around 700MB.
I have two thoughts:
pip --no-dependencies
is a thingpip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
if CUDA capacity is not needed, but that's probably not what you are looking forhi @mayankmalik-colab Yes torch 2.1.0 we use to package the CUDA dependencies in the torch lib folder, and we provided version version that installed torch + cuda dependencies via pip as well. Since 2.2.0 we switched to small wheel installing CUDA dependencies via pip.
if you preinstall all CUDA dependencies then doing pip install should install only torch and other missing dependencies like this:
pip3 install --pre torch --index-url https://download.pytorch.org/whl/cu121 --upgrade
Defaulting to user installation because normal site-packages is not writeable
Looking in indexes: https://download.pytorch.org/whl/cu121
Requirement already satisfied: torch in /home/atalman/.local/lib/python3.8/site-packages (2.3.0.dev20240227+cu121)
Requirement already satisfied: nvidia-cuda-cupti-cu12==12.1.105 in /home/atalman/.local/lib/python3.8/site-packages (from torch) (12.1.105)
Requirement already satisfied: typing-extensions>=4.8.0 in /home/atalman/.local/lib/python3.8/site-packages (from torch) (4.9.0)
Requirement already satisfied: fsspec in /usr/local/lib/python3.8/dist-packages (from torch) (2023.1.0)
Requirement already satisfied: nvidia-curand-cu12==10.3.2.106 in /home/atalman/.local/lib/python3.8/site-packages (from torch) (10.3.2.106)
Requirement already satisfied: nvidia-cudnn-cu12==8.9.2.26 in /home/atalman/.local/lib/python3.8/site-packages (from torch) (8.9.2.26)
Requirement already satisfied: pytorch-triton==3.0.0+901819d2b6 in /home/atalman/.local/lib/python3.8/site-packages (from torch) (3.0.0+901819d2b6)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.8/dist-packages (from torch) (3.1.2)
Requirement already satisfied: nvidia-nccl-cu12==2.19.3 in /home/atalman/.local/lib/python3.8/site-packages (from torch) (2.19.3)
Requirement already satisfied: nvidia-cublas-cu12==12.1.3.1 in /home/atalman/.local/lib/python3.8/site-packages (from torch) (12.1.3.1)
Requirement already satisfied: nvidia-cufft-cu12==11.0.2.54 in /home/atalman/.local/lib/python3.8/site-packages (from torch) (11.0.2.54)
Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.1.105 in /home/atalman/.local/lib/python3.8/site-packages (from torch) (12.1.105)
Requirement already satisfied: nvidia-cuda-runtime-cu12==12.1.105 in /home/atalman/.local/lib/python3.8/site-packages (from torch) (12.1.105)
Requirement already satisfied: filelock in /usr/local/lib/python3.8/dist-packages (from torch) (3.9.0)
Requirement already satisfied: nvidia-cusolver-cu12==11.4.5.107 in /home/atalman/.local/lib/python3.8/site-packages (from torch) (11.4.5.107)
Requirement already satisfied: nvidia-nvtx-cu12==12.1.105 in /home/atalman/.local/lib/python3.8/site-packages (from torch) (12.1.105)
Requirement already satisfied: networkx in /home/atalman/.local/lib/python3.8/site-packages (from torch) (3.1)
Requirement already satisfied: nvidia-cusparse-cu12==12.1.0.106 in /home/atalman/.local/lib/python3.8/site-packages (from torch) (12.1.0.106)
Requirement already satisfied: sympy in /home/atalman/.local/lib/python3.8/site-packages (from torch) (1.12)
Requirement already satisfied: nvidia-nvjitlink-cu12 in /home/atalman/.local/lib/python3.8/site-packages (from nvidia-cusolver-cu12==11.4.5.107->torch) (12.3.101)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.8/dist-packages (from jinja2->torch) (2.1.2)
Requirement already satisfied: mpmath>=0.19 in /home/atalman/.local/lib/python3.8/site-packages (from sympy->torch) (1.3.0)
I was trying to use torch-2.2.1 wheel but it installs Cuda dependencies as well, which is not the case with the current torch-2.1.0 wheel we use. It is important that we don't install Cuda dependecies as those mess up with other frameworks like Jax.
@mayankmalik-colab just want to clarify, that those cuda dependencies are not system wide, i.e. they are installed in python site-packages folder and unless one sets LD_PRELOAD
in a very specific way should not be visible from other packages. In 2.1.0 wheel it was quite similar story, i.e. those dependencies were a dynamic libraries bundled with the wheel and only when TORCH is imported respective dependencies were loaded in the processes address space
I wonder if you have a test that you run that used to work with Torch-2.1.0 and JAX, but fails with Torch-2.2.1
For reference, this is issue we used deprecating large wheels: https://github.com/pytorch/pytorch/issues/113972
Is this issue related to https://github.com/googlecolab/colabtools/issues/4345 ?
@mayankmalik-colab Could you please post more information about conflict you are seeing. We are interested in following:
So that we can try to reproduce this conflict in our environment. Are installation scripts in OSS ? Could you post a link or share some of the script with us ?
Would torch-2.2.1 work with cuda-12.4 without building from source or are there specific kernels that wouldn't work?
Tried to install the latest JAX and torch on a python 3.10 virtual-env. Installed jax first then pytorch to force pip to pull CUDA-12.4 (what jax depends on) and got this error:
$ pip install -U "jax[cuda12_pip]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
$ pip install torch==2.2.1
$ python -c "import torch" ~/workspace
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/usr/local/google/home/kiuk/.pyenv/versions/venv310/lib/python3.10/site-packages/torch/__init__.py", line 237, in <module>
from torch._C import * # noqa: F403
ImportError: /usr/local/google/home/kiuk/.pyenv/versions/venv310/lib/python3.10/site-packages/torch/lib/../../nvidia/cusparse/lib/libcusparse.so.12: undefined symbol: __nvJitLinkComplete_12_4, version libnvJitLink.so.12
Looking at the error more closely, seemed like it was due to the versions between cusparse and nvjitlink not matching up. Looks like JAX doesn't depend on cusparse so cusparse-12.1.0.106 was installed with torch but nvjitlink was left as 12.4.99 (from the jax install).
So I figured I'd just ugrade to cusparse-12.3.0.142 and see what happens. And was able to get past the error above. That said, I realize I'm not actually hitting any sparse kernels so its hard to say whether this actually works, and in any case it'll be better to build pytorch from source.
Here's the list of nvidia libs and versions that I had in my virtual environment to get past initial errors from both jax and torch and successfully create CUDA tensors on each lib (again, I haven't actually run any ops so its hard to conclude that this works):
nvidia-cublas-cu12 12.4.2.65
nvidia-cuda-cupti-cu12 12.4.99
nvidia-cuda-nvcc-cu12 12.4.99
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu12 12.4.99
nvidia-cudnn-cu12 8.9.7.29
nvidia-cufft-cu12 11.2.0.44
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu12 11.6.0.99
nvidia-cusparse-cu12 12.3.0.142
nvidia-nccl-cu12 2.19.3
nvidia-nvjitlink-cu12 12.4.99
nvidia-nvtx-cu12 12.1.105
triton 2.2.0
jax 0.4.25
jaxlib 0.4.25+cuda12.cudnn89
torch 2.2.1
FWIW Internally in google3 we are building torch-2.2.1 with CUDA-12.3 (but we use clang not nvcc to compile cuda)
cc @malfet , @atalman
For reference, this is issue we used deprecating large wheels: pytorch/pytorch#113972
Is this issue related to #4345 ?
@mayankmalik-colab Could you please post more information about conflict you are seeing. We are interested in following:
- What version of JAX and how its installed
- How torch is installed
So that we can try to reproduce this conflict in our environment. Are installation scripts in OSS ? Could you post a link or share some of the script with us ?
@atalman @malfet I got stuck on some other work, so couldn't reply earlier.
Anyways, check the below pointers:
pip install torch==2.1.0
(and not the wheel directly) does install CUDA dependencies though). I am not sure why there is a difference. Anyways, this will not be of any concern in future with newer versions.!pip install torch==2.2.1
and then
import jax
print(jax.default_backend())
You would see cpu
as the output and the warning WARNING:jax._src.xla_bridge:CUDA backend failed to initialize: Found CUDA version 12010, but JAX was built against version 12020, which is newer. The copy of CUDA that is installed must be at least as new as the version against which JAX was built. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
!pip install torch==2.2.1
!pip uninstall -y nvidia-nvtx-cu12 nvidia-nvjitlink-cu12 nvidia-nccl-cu12 nvidia-curand-cu12 nvidia-cufft-cu12 nvidia-cuda-runtime-cu12 nvidia-cuda-nvrtc-cu12 nvidia-cuda-cupti-cu12 nvidia-cublas-cu12 nvidia-cusparse-cu12 nvidia-cudnn-cu12 nvidia-cusolver-cu12
import jax
print(jax.default_backend())
-> It would return gpu
also,
import torch
print(torch.cuda.is_available())
will return true. (We already have CUDA 12.2 installed by APT) . I ran a bunch of torch code and I was able to access GPU just fine.
I was wodering if there is a way NOT to install the CUDA dependencies while installing torch or I could uninstall those CUDA dependencies in the script of ours ? Any thoughts or permanent solution?
I was wodering if there is a way NOT to install the CUDA dependencies while installing torch
!pip install --no-deps
is an answer to your question.
or I could uninstall those CUDA dependencies in the script of ours ?
If it passes some smoke tests, I don't see the problem, i.e. CUDA-12.2 should be binary compatible with 12.1, so if torch finds all the libraries it would most likely work (would be nice to run some smoke tests though, can provide you with a small list)
Any thoughts or permanent solution?
Are you building JAX from source or install from PIP for the colab container? If former, why not do the same and build PyTorch from source as well? If later, then how JAX finds CUDA libraries it depends on?
I was wodering if there is a way NOT to install the CUDA dependencies while installing torch
!pip install --no-deps
is an answer to your question.or I could uninstall those CUDA dependencies in the script of ours ?
If it passes some smoke tests, I don't see the problem, i.e. CUDA-12.2 should be binary compatible with 12.1, so if torch finds all the libraries it would most likely work (would be nice to run some smoke tests though, can provide you with a small list)
Any thoughts or permanent solution?
Are you building JAX from source or install from PIP for the colab container? If former, why not do the same and build PyTorch from source as well? If later, then how JAX finds CUDA libraries it depends on?
@malfet
!pip install --no-deps
would not work as we still want other dependencies - filelock, fsspec, jinja2, networkx, sympy, triton, typing-extensions
. However, I don't think we can partially select dependencies. I am tried to remove CUDA dependencies after installation, though it's not ideal.@malfet @atalman - we upgraded torch and other related packages. However, we had to remove Cuda related dependencies downloaded along with torch. So, torch is using system CUDA for now. We tested a few basic things and it seems to be working fine but if you would like to test anything, feel free to do so.
@mayankmalik-colab Could you describe how the removal Cuda related dependencies was done
@mayankmalik-colab Could you describe how the removal Cuda related dependencies was done
python3 -m pip uninstall -y \
nvidia-cublas-cu12 \
nvidia-cuda-cupti-cu12 \
nvidia-cuda-nvrtc-cu12 \
nvidia-cuda-runtime-cu12 \
nvidia-cudnn-cu12 \
nvidia-cufft-cu12 \
nvidia-curand-cu12 \
nvidia-cusolver-cu12 \
nvidia-cusparse-cu12 \
nvidia-nccl-cu12 \
nvidia-nvjitlink-cu12 \
nvidia-nvtx-cu12 ; \
@mayankmalik-colab Could you describe how the removal Cuda related dependencies was done
python3 -m pip uninstall -y \ nvidia-cublas-cu12 \ nvidia-cuda-cupti-cu12 \ nvidia-cuda-nvrtc-cu12 \ nvidia-cuda-runtime-cu12 \ nvidia-cudnn-cu12 \ nvidia-cufft-cu12 \ nvidia-curand-cu12 \ nvidia-cusolver-cu12 \ nvidia-cusparse-cu12 \ nvidia-nccl-cu12 \ nvidia-nvjitlink-cu12 \ nvidia-nvtx-cu12 ; \
Thanks @kiukchung . Yes, that's how we did. I know this is not ideal but we had to do it for now.
Here are some basic tests to see if basic functionality is there: https://github.com/pytorch/builder/blob/main/test/smoke_test/smoke_test.py With
MATRIX_GPU_ARCH_VERSION=12.1
MATRIX_GPU_ARCH_TYPE=cuda
On Pytorch side we would need to:
Validation of fixes can found at https://github.com/pytorch/pytorch/issues/123296
Hello,
We released
pytorch v2.2.0 torchvision v0.17.0 torchaudio v2.2.0
The wheel installation instructions are.
pytorch
Install command for CUDA 12.1 environment:
Project link: https://pypi.org/project/torch/2.2.0/#files
torchvision
Install command for CUDA 12.1 environment:
https://pypi.org/project/torchvision/0.17.0/#files
torchaudio
Install command for CUDA 12.1 environment:
https://pypi.org/project/torchaudio/2.2.0/#files
Other notes If you require wheels for Python 3.8, 3.9, 3.10, 3.11 or 3.12. We support CPU, CUDA 11.8 and CUDA 12.1 Compute Plaforms. You can find the links here: download.pytorch.org/whl/torch_stable.html
We're looking to having it updated in Colab.
Thanks very much.
cc'ing @colaboratory-team @mayankmalik-colab @malfet @seemethere
Similar to https://github.com/googlecolab/colabtools/issues/4039