neuro-inc / neuro-base-environment

Base docker image used in Neuro Platform Template, deployed on DockerHub as neuromation/base
Apache License 2.0
1 stars 0 forks source link

Pytorch does not work properly with CUDA on v22.12.0 #568

Closed andriihomiak closed 1 year ago

andriihomiak commented 1 year ago

Trying to run CUDA code with torch is broken, presumably due to an incompatible CUDA version (11.8 instead of 11.6):

>>> import torch
>>> torch.cuda.is_available()
/opt/conda/lib/python3.9/site-packages/torch/cuda/__init__.py:88: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 804: forward compatibility was attempted on non supported HW (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:109.)
  return torch._C._cuda_getDeviceCount() > 0
False

Behavior on v22.8.0 (CUDA 11.2):

>>> import torch
>>> torch.cuda.is_available()
True
andriihomiak commented 1 year ago

fixed by upgrading CUDA on the nodes