sinzlab / sensorium

Code base for the SENSORIUM competition.
https://sensorium2022.net/
MIT License
59 stars 32 forks source link

Nvidia driver problem #77

Closed vitruvi closed 2 years ago

vitruvi commented 2 years ago

After installing docker, docker-compose and exec'ing "docker-compose run -d -p 10101:8888 jupyterlab", when I run "1_inspect_data.ipynb", I get the error below.

"" RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx ""

My specs are Ubuntu 20.04, docker version 20.10.16, rtx 2070 graphics card and cuda installed on the system (Driver Version: 470.129.06 CUDA Version: 11.4) apart from docker. How can I solve this problem?

KonstantinWilleke commented 2 years ago

Hi,

thanks for trying out our package! But I'm surprised that the CUDA driver seems to be incompatible with the PyTorch version.

Let me propose two potential solutions. The first one would be to install the latest PyTorch version: For this, you can open a terminal within jupyter lab, and run: pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113 or you run it within the notebook "1_inspect_data.ipynb": !pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113

Then, if you restart the kernel of the notebook, everything will work, hopefully.

If that still leads to the same error message, you could consider downgrading your CUDA driver on your system to version 11.3, and then re-building the container.

Let me know if it works or if you have any questions!

Konstantin

CYHSM commented 2 years ago

I have the same error and the pip install sadly does not help. I also can't downgrade from Cuda 11.7 to 11.3 as the driver does not support it. I use the GPUs without problems in a conda environment with cudatoolkit 11.3, so it seems to be a docker problem. I don't see a real solution though, as PyTorch also does not (yet) support Cuda 11.7...

KonstantinWilleke commented 2 years ago

Hi - yes that could be a Docker problem. @cblessing24, would you have an idea of how to solve it?

christoph-blessing commented 2 years ago

You are probably missing NVIDIA Container Toolkit. This package is necessary to use GPUs in Docker containers. Please install it and report back.

vitruvi commented 2 years ago

Problem persists in my case. Same as my first message after installing docker nvidia container toolkit.

christoph-blessing commented 2 years ago

Did you verify that you can use GPUs in a container after installing the toolkit? What is the output of the following command?

sudo docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi
vitruvi commented 2 years ago

Yes, nvidia-smi works on docker.

+-----------------------------------------------------------------------------+ | NVIDIA-SMI 470.129.06 Driver Version: 470.129.06 CUDA Version: 11.4 | ...

christoph-blessing commented 2 years ago

Can you post the whole output here please?

christoph-blessing commented 2 years ago

Okay, I think I was able to reproduce the issue. I changed my docker-compose.yml file to this to fix it:

version: '3.4'

services:
  jupyterlab:
    image: sensorium
    build:
      context: .
    volumes:
      - .:/project
      - ./notebooks:/notebooks
    environment:
      - JUPYTER_PASSWORD=
    deploy:
      resources:
        reservations:
          devices:
            - capabilities: ["gpu"]

Sources: https://docs.docker.com/compose/gpu-support/ https://github.com/compose-spec/compose-spec/blob/master/deploy.md#devices

vitruvi commented 2 years ago

Gives the error below.

ERROR: The Compose file './docker-compose.yml' is invalid because: services.jupyterlab.deploy.resources.reservations value Additional properties are not allowed ('devices' was unexpected)

christoph-blessing commented 2 years ago

Which version of docker-compose are you using? I am using 2.6.1.

vitruvi commented 2 years ago

Yeah that is the problem. "apt" installs 1.25 version on ubuntu 20.04. docker-compose should be installed manually. Now it works out of the box.

docker-compose version 2.6.1 can be listed as a requirement on main page.

KonstantinWilleke commented 2 years ago

Great to hear that this problem is fixed! For other discussions, also feel free to join our slack community (link can be found here https://sensorium2022.net/home in the contact section).