CUDA driver version is insufficient for CUDA runtime version

superfunk2000 commented 1 year ago

For three days I've been trying to get Viseron to run with GPU support. Now I need your help in configuring:

Ubuntu 22.04.2 LTS (Brand new installed, with all patches)
NVIDIA GeForce GTX 1660 Ti
roflcoopter/amd64-cuda-viseron

Installed versions:

$ cat docker-compose.yaml
version: "2.4"

services:
  viseron:
    image: roflcoopter/amd64-cuda-viseron
    container_name: viseron
    restart: unless-stopped
    volumes:
      - /home/viseron/recordings:/recordings
      - /home/viseron/config:/config
      - /etc/localtime:/etc/localtime:ro
    ports:
      - 8888:8888
    environment:
      - PUID=1000
      - PGID=1000

$ docker -v
Docker version 24.0.2, build cb74dfc
$ docker compose version
Docker Compose version v2.18.1

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Jun_13_19:16:58_PDT_2023
Cuda compilation tools, release 12.2, V12.2.91
Build cuda_12.2.r12.2/compiler.32965470_0

$ nvidia-smi
Thu Jun 29 13:06:56 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce GTX 1660 Ti     On  | 00000000:29:00.0 Off |                  N/A |
|  0%   45C    P8               4W / 120W |      1MiB /  6144MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

$ dpkg -l | grep cuda-toolkit
ii  cuda-toolkit-12-2                     12.2.0-1                                amd64        CUDA Toolkit 12.2 meta-package
ii  cuda-toolkit-12-2-config-common       12.2.53-1                               all          Common config package for CUDA Toolkit 12.2.
ii  cuda-toolkit-12-config-common         12.2.53-1                               all          Common config package for CUDA Toolkit 12.
ii  cuda-toolkit-config-common            12.2.53-1                               all          Common config package for CUDA Toolkit.

$ clinfo -l
Platform #0: NVIDIA CUDA
 `-- Device #0: NVIDIA GeForce GTX 1660 Ti

Why is Viseron not activating hardware acceleration?

And the following error message comes up?

viseron  | [2023-06-29 13:15:00] [ERROR   ] [root] - Uncaught exception
viseron  | Traceback (most recent call last):
viseron  |   File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
viseron  |     return _run_code(code, main_globals, None,
viseron  |   File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
viseron  |     exec(code, run_globals)
viseron  |   File "/src/viseron/__main__.py", line 33, in <module>
viseron  |     sys.exit(init())
viseron  |   File "/src/viseron/__main__.py", line 29, in init
viseron  |     return main()
viseron  |   File "/src/viseron/__main__.py", line 20, in main
viseron  |     viseron = setup_viseron()
viseron  |   File "/src/viseron/__init__.py", line 152, in setup_viseron
viseron  |     setup_domains(vis)
viseron  |   File "/src/viseron/components/__init__.py", line 638, in setup_domains
viseron  |     future.result()
viseron  |   File "/usr/lib/python3.8/concurrent/futures/_base.py", line 437, in result
viseron  |     return self.__get_result()
viseron  |   File "/usr/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
viseron  |     raise self._exception
viseron  |   File "/usr/lib/python3.8/concurrent/futures/thread.py", line 57, in run
viseron  |     result = self.fn(*self.args, **self.kwargs)
viseron  |   File "/src/viseron/components/__init__.py", line 375, in setup_domain
viseron  |     domain_module = self.get_domain(domain_to_setup.domain)
viseron  |   File "/src/viseron/components/__init__.py", line 234, in get_domain
viseron  |     return importlib.import_module(f"{self._path}.{domain}")
viseron  |   File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module
viseron  |     return _bootstrap._gcd_import(name[level:], package, level)
viseron  |   File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
viseron  |   File "<frozen importlib._bootstrap>", line 991, in _find_and_load
viseron  |   File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
viseron  |   File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
viseron  |   File "<frozen importlib._bootstrap_external>", line 848, in exec_module
viseron  |   File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
viseron  |   File "/src/viseron/components/dlib/face_recognition.py", line 16, in <module>
viseron  |     from .predict import predict
viseron  |   File "/src/viseron/components/dlib/predict.py", line 2, in <module>
viseron  |     import face_recognition
viseron  |   File "/usr/local/lib/python3.8/dist-packages/face_recognition/__init__.py", line 7, in <module>
viseron  |     from .api import load_image_file, face_locations, batch_face_locations, face_landmarks, face_encodings, compare_faces, face_distance
viseron  |   File "/usr/local/lib/python3.8/dist-packages/face_recognition/api.py", line 26, in <module>
viseron  |     cnn_face_detector = dlib.cnn_face_detection_model_v1(cnn_face_detection_model)
viseron  | RuntimeError: Error while calling cudaGetDevice(&the_device_id) in file /tmp/dlib/dlib/cuda/gpu_data.cpp:204. code: 35, reason: CUDA driver version is insufficient for CUDA runtime version

I don't know what to do and hope you can help me.

roflcoopter commented 1 year ago

Sorry you are having issues!

Can you run nvidia-smi inside the container and show me the output?

I would also like to see your docker run command, or docker-compose file if you are using that.

bailboy91 commented 1 year ago

With the newest nvidia container toolkit you must have your compose file with something like this in it:

    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

EDIT: You must also HAVE nvidia-container-runtime / toolkit as well. I had it working on my setup with versions nvidia-container-runtime 3.13.0-1 nvidia-container-toolkit 1.13.2-1

All detailed in nvidia's container guide.

superfunk2000 commented 1 year ago

@roflcoopter:

root@eafbdbbdc8e2:/src# nvidia-smi
bash: nvidia-smi: command not found

@bailboy91:

$ docker compose up -d && docker compose logs -f
[+] Building 0.0s (0/0)
[+] Running 0/1
 ⠹ Container viseron  Starting                                                                                                                                              0.2s
Error response from daemon: could not select device driver "nvidia" with capabilities: [[gpu]]

superfunk2000 commented 1 year ago

I read through the NVIDIA page to install the NVIDIA Container Toolkit. I'm already failing when trying to install the "nvidia-container-toolkit-base".

$ sudo apt-get install -y nvidia-container-toolkit-base
Reading package lists... Done
Building dependency tree... Done
Status information is read in... Done
E: Package nvidia-container-toolkit-base cannot be found.

I have to say that I'm not a Linux specialist. But I can google...

bailboy91 commented 1 year ago

@superfunk2000 I was in your boat when I found viseron too. A lot of bits of information and lack luster documentation from Nvidia on setting things up.

On my working cuda machine I used apt-cache madison nvidia-container-toolkit-base and that told me what repo it came from. The link below should get you going. I also have a cuda repo that i believe I needed as well but i have a quadro card and I can't remember if it's different than a consumer gpu.

https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#setting-up-nvidia-container-toolkit

superfunk2000 commented 1 year ago

Hello @bailboy91,

thank you for the hint. I've installed the nvidia-container-toolkit from your link. A first test was positive:

$ sudo docker run --rm --runtime=nvidia --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi
Fri Jun 30 14:40:37 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce GTX 1660 Ti     On  | 00000000:29:00.0 Off |                  N/A |
|  0%   48C    P8               4W / 120W |      1MiB /  6144MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+`

After that, I added

    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

in my docker-compose.yaml.

Now it looks like this:

But what about VA API? I wanted to install "vdpau-va-driver" but the package is only available up to Ubuntu 18.04 LTS and was removed from sources after that. apt-get suggested "mesa-va-drivers" for me to install, but this did not bring any improvement.

Any ideas?

vainfo brings this information:

$vainfo
error: can't connect to X server!
error: failed to initialize display

Do I need to install the X Server?

bailboy91 commented 1 year ago

Do you have a va-api compatibe card? Viseron will use the nvidia gpu for everything. It will just use nvidia encoders / CUDA to to all the magic in Viseron. You can add say a Google Coral TPU to the mix as well (What I'm doing now actually).

I've only ever used the nvidia encoders and cuda in viseron.

If you do have say an intel gpu or amd gpu as well in the machine you would use MESA drivers and mesa has a vaapi package. But as far as I've tested in Viseron you should choose either CUDA or Vaapi, not both.

superfunk2000 commented 1 year ago

OK, so I'm happy with the configuration now. Thanks very much!

roflcoopter / viseron

CUDA driver version is insufficient for CUDA runtime version #624