Closed rmojgani closed 1 year ago
probably have to install korali on NVIDIA container?
Hi, are you working in the docker? or did you install korali?
if you run a python code that uses gpu, this should work out of the box (unless you dont have the gpu available in the docker)
I use the docker you guys maintain. You are right, the problem is that the GPU is not available inside the docker. Below is a screenshot, left is in the docker started in the local machine, right is the local machine
is this what I should do ? https://www.howtogeek.com/devops/how-to-use-an-nvidia-gpu-with-docker-containers/
ok, I got it :) thanks
ok cool you solved it :-)
thanks, this is just for my own future reference:
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)&&
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - &&
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
apt-get update
apt-get install -y nvidia-docker2
sudo systemctl restart docker
Test:
docker run -it --gpus all nvidia/cuda:11.4.0-base-ubuntu20.04 nvidia-smi
sudo docker run --rm -it --gpus all cselab/korali
Install nvcc:
apt update
apt install nvidia-cuda-toolkit
pip install --upgrade "jax[cuda]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
Test jax:
from jax.lib import xla_bridge
print(xla_bridge.get_backend().platform)
import jax.numpy as np
x=np.linspace(0,10,10)
Then had to install CUDA, cuCNN with correct version, build JAX (lots of errors, but keep building ,everytime less error and finally works!)
here is a dump of useful material
wget https://developer.download.nvidia.com/compute/cuda/11.6.0/local_installers/cuda_11.6.0_510.39.01_linux.run
sh cuda_11.6.0_510.39.01_linux.run
Driver: Not Selected Toolkit: Installed in /usr/local/cuda-11.6/
Please make sure that
To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-11.6/bin
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 510.00 is required for CUDA 11.6 functionality to work.
To install the driver using this installer, run the following command, replacing
$ sudo apt install gcc-10 g++-10
$ export CC=/usr/bin/gcc-10
$ export CXX=/usr/bin/g++-10
$ export CUDA_ROOT=/usr/local/cuda
$ ln -s /usr/bin/gcc-10 $CUDA_ROOT/bin/gcc
$ ln -s /usr/bin/g++-10 $CUDA_ROOT/bin/g++
(Build Instant-NGP as described)
@wadaniel To get it working, I also had to install CUDA and cuDDN , now the docker image is 25GB! Here is a link on DockerHUB,
Korali + CUDA + cuDDN (I will make it private soon)
My question is, is there a chance these interfere with other packages? oneDNN?
cool interesting. woa so big?! thanks for the link.
no i dont think so there will be any problems, or what do you mean by interfere? we use onednn inside korali for the NN implementation, but there shouldnt be inference
yeah, I am quite annoyed with the size.
I just wanted it work, maybe there is a minimal installation possible also.
r what do you mean by interfere?
not sure, memory leak? something in software engineering which I am not aware of .
onednn is CPU, right?
no there should be no inference, not that i am aware of. onednn is CPU and GPU, its now called oneAPI: https://github.com/oneapi-src/oneDNN, its the backend of eg tensorflow and pytorch
so is there a flag
or something like that to force Korali use "oneAPI" on CPU of GPU?
yes, during build you must set meson -DoneDNN True
otherwise it uses "our" NN impelenatation based on the eigen library for matrix matrix multiplications, which is not that optimized
I haven't build Korali for my work yet, I was using the docker image
@wadaniel Have you tried installing Jax on the Korali docker ? if it causes issues connecting to gpu ?