rmojgani / RLonKorali

1 stars 4 forks source link

Jax on Korali #7

Closed rmojgani closed 1 year ago

rmojgani commented 1 year ago

@wadaniel Have you tried installing Jax on the Korali docker ? if it causes issues connecting to gpu ?

rmojgani commented 1 year ago

probably have to install korali on NVIDIA container?

wadaniel commented 1 year ago

Hi, are you working in the docker? or did you install korali?

wadaniel commented 1 year ago

if you run a python code that uses gpu, this should work out of the box (unless you dont have the gpu available in the docker)

rmojgani commented 1 year ago

I use the docker you guys maintain. You are right, the problem is that the GPU is not available inside the docker. Below is a screenshot, left is in the docker started in the local machine, right is the local machine image

rmojgani commented 1 year ago

is this what I should do ? https://www.howtogeek.com/devops/how-to-use-an-nvidia-gpu-with-docker-containers/

rmojgani commented 1 year ago

ok, I got it :) thanks

wadaniel commented 1 year ago

ok cool you solved it :-)

rmojgani commented 1 year ago

thanks, this is just for my own future reference: distribution=$(. /etc/os-release;echo $ID$VERSION_ID)&& curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

apt-get update apt-get install -y nvidia-docker2

sudo systemctl restart docker

Test: docker run -it --gpus all nvidia/cuda:11.4.0-base-ubuntu20.04 nvidia-smi

sudo docker run --rm -it --gpus all cselab/korali

Install nvcc: apt update apt install nvidia-cuda-toolkit

pip install --upgrade "jax[cuda]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

Test jax: from jax.lib import xla_bridge print(xla_bridge.get_backend().platform) import jax.numpy as np x=np.linspace(0,10,10)

rmojgani commented 1 year ago

Then had to install CUDA, cuCNN with correct version, build JAX (lots of errors, but keep building ,everytime less error and finally works!)

here is a dump of useful material

https://medium.com/analytics-vidhya/install-cuda-11-2-cudnn-8-1-0-and-python-3-9-on-rtx3090-for-deep-learning-fcf96c95f7a1

https://developer.nvidia.com/cuda-11-6-0-download-archive?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=20.04&target_type=runfile_local

wget https://developer.download.nvidia.com/compute/cuda/11.6.0/local_installers/cuda_11.6.0_510.39.01_linux.run sh cuda_11.6.0_510.39.01_linux.run

vim

export PATH=/usr/local/cuda-11.6/bin${PATH:+:${PATH}} export LD_LIBRARY_PATH=/usr/local/cuda-11.6/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}} export CUDA_HOME=/usr/local/cuda

Driver: Not Selected Toolkit: Installed in /usr/local/cuda-11.6/

Please make sure that

To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-11.6/bin ***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 510.00 is required for CUDA 11.6 functionality to work. To install the driver using this installer, run the following command, replacing with the name of this run file: sudo .run --silent --driver

$ sudo apt install gcc-10 g++-10 $ export CC=/usr/bin/gcc-10 $ export CXX=/usr/bin/g++-10 $ export CUDA_ROOT=/usr/local/cuda $ ln -s /usr/bin/gcc-10 $CUDA_ROOT/bin/gcc $ ln -s /usr/bin/g++-10 $CUDA_ROOT/bin/g++ (Build Instant-NGP as described)

rmojgani commented 1 year ago

@wadaniel To get it working, I also had to install CUDA and cuDDN , now the docker image is 25GB! Here is a link on DockerHUB,

Korali + CUDA + cuDDN (I will make it private soon)

My question is, is there a chance these interfere with other packages? oneDNN?

wadaniel commented 1 year ago

cool interesting. woa so big?! thanks for the link.

no i dont think so there will be any problems, or what do you mean by interfere? we use onednn inside korali for the NN implementation, but there shouldnt be inference

rmojgani commented 1 year ago

yeah, I am quite annoyed with the size.

I just wanted it work, maybe there is a minimal installation possible also.

r what do you mean by interfere?

not sure, memory leak? something in software engineering which I am not aware of .

onednn is CPU, right?

wadaniel commented 1 year ago

no there should be no inference, not that i am aware of. onednn is CPU and GPU, its now called oneAPI: https://github.com/oneapi-src/oneDNN, its the backend of eg tensorflow and pytorch

rmojgani commented 1 year ago

so is there a flag or something like that to force Korali use "oneAPI" on CPU of GPU?

wadaniel commented 1 year ago

yes, during build you must set meson -DoneDNN True

wadaniel commented 1 year ago

otherwise it uses "our" NN impelenatation based on the eigen library for matrix matrix multiplications, which is not that optimized

rmojgani commented 1 year ago

I haven't build Korali for my work yet, I was using the docker image