linuxserver / docker-code-server

GNU General Public License v3.0
1.54k stars 314 forks source link

[BUG] Can't use cuda driver inside container for tensorflow #161

Closed Sebulba46 closed 8 months ago

Sebulba46 commented 8 months ago

Is there an existing issue for this?

Current Behavior

abc@0bede98b26ec:~/workspace$ nvcc --version nvcc command not found

abc@0bede98b26ec:~/workspace$ nvidia-smi +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.113.01 Driver Version: 535.113.01 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce RTX 3090 Off | 00000000:01:00.0 Off | N/A | | 0% 27C P8 28W / 350W | 5MiB / 24576MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | No running processes found | +---------------------------------------------------------------------------------------+

Expected Behavior

abc@0bede98b26ec:~/workspace$ nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2021 NVIDIA Corporation Built on Mon_Oct_11_21:27:02_PDT_2021 Cuda compilation tools, release 11.4, V11.4.152 Build cuda_11.4.r11.4/compiler.30521435_0

Steps To Reproduce

abc@0bede98b26ec:~/workspace$ nvcc --version

Environment

- OS: Ubuntu 23.04
- How docker service was installed: distro's packagemanager

CPU architecture

x86-64

Docker creation

---
version: '3'
volumes:
  jellyfin_config:
  jellyfin_cache:

services:
  code-server:
    image: lscr.io/linuxserver/code-server:latest
    network_mode: 'ngnixproxymanager_default'
    restart: always
    volumes:
      - /path/to/appdata/config:/config

    runtime: nvidia
    environment:
      - PUID=1000
      - PGID=1000
      - TZ=Etc/UTC
      - PASSWORD=Zek2Nq#63!rtSJ
      - SUDO_PASSWORD=Zek2Nq#63!rtSJ
      - HTTPS_PROXY=vs.sebulbaserver.duckdns.org

    ports:
      - 8443:8443
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [compute, utility]

Container logs

usermod: no changes
───────────────────────────────────────
      ██╗     ███████╗██╗ ██████╗ 
      ██║     ██╔════╝██║██╔═══██╗
      ██║     ███████╗██║██║   ██║
      ██║     ╚════██║██║██║   ██║
      ███████╗███████║██║╚██████╔╝
      ╚══════╝╚══════╝╚═╝ ╚═════╝ 
   Brought to you by linuxserver.io
───────────────────────────────────────
To support LSIO projects visit:
https://www.linuxserver.io/donate/
───────────────────────────────────────
GID/UID
───────────────────────────────────────
User UID:    1000
User GID:    1000
───────────────────────────────────────
setting up sudo access
setting sudo password using SUDO_PASSWORD env var
New password: Retype new password: passwd: password updated successfully
[custom-init] No custom files found, skipping...
[2023-10-20T13:34:57.077Z] info  code-server 4.17.1 2eba7af117ea58d45a6c6449ee4fe63c8d4d53aa
[2023-10-20T13:34:57.078Z] info  Using user-data-dir /config/data
[2023-10-20T13:34:57.084Z] info  Using config file /config/.config/code-server/config.yaml
[2023-10-20T13:34:57.084Z] info  HTTP server listening on http://0.0.0.0:8443/
[2023-10-20T13:34:57.084Z] info    - Authentication is enabled
[2023-10-20T13:34:57.084Z] info      - Using password from $PASSWORD
[2023-10-20T13:34:57.084Z] info    - Not serving HTTPS
[2023-10-20T13:34:57.084Z] info  Session server listening on /config/data/code-server-ipc.sock
[ls.io-init] done.
[13:36:26] 
[13:36:26] Extension host agent started.
File not found: /app/code-server/lib/vscode/out/vsda_bg.wasm
File not found: /app/code-server/lib/vscode/out/vsda.js
[13:36:26] [172.22.0.1][8647114b][ManagementConnection] New connection established.
error=Invalid URL
[2023-10-20T13:36:27.201Z] error Failed to get latest version 
[13:36:27] [172.22.0.1][6f6a1897][ExtensionHostConnection] New connection established.
[13:36:27] [172.22.0.1][6f6a1897][ExtensionHostConnection] <598> Launched Extension Host Process.
[migrations] started
[migrations] no migrations found
usermod: no changes
───────────────────────────────────────
      ██╗     ███████╗██╗ ██████╗ 
      ██║     ██╔════╝██║██╔═══██╗
      ██║     ███████╗██║██║   ██║
      ██║     ╚════██║██║██║   ██║
      ███████╗███████║██║╚██████╔╝
      ╚══════╝╚══════╝╚═╝ ╚═════╝ 
   Brought to you by linuxserver.io
───────────────────────────────────────
To support LSIO projects visit:
https://www.linuxserver.io/donate/
───────────────────────────────────────
GID/UID
───────────────────────────────────────
User UID:    1000
User GID:    1000
───────────────────────────────────────
setting up sudo access
setting sudo password using SUDO_PASSWORD env var
New password: Retype new password: passwd: password updated successfully
[custom-init] No custom files found, skipping...
[2023-10-20T18:19:32.923Z] info  code-server 4.17.1 2eba7af117ea58d45a6c6449ee4fe63c8d4d53aa
[2023-10-20T18:19:32.924Z] info  Using user-data-dir /config/data
[2023-10-20T18:19:32.929Z] info  Using config file /config/.config/code-server/config.yaml
[2023-10-20T18:19:32.929Z] info  HTTP server listening on http://0.0.0.0:8443/
[2023-10-20T18:19:32.929Z] info    - Authentication is enabled
[2023-10-20T18:19:32.929Z] info      - Using password from $PASSWORD
[2023-10-20T18:19:32.929Z] info    - Not serving HTTPS
[2023-10-20T18:19:32.929Z] info  Session server listening on /config/data/code-server-ipc.sock
[ls.io-init] done.
[18:20:46] 
[18:20:46] Extension host agent started.
File not found: /app/code-server/lib/vscode/out/vsda_bg.wasm
File not found: /app/code-server/lib/vscode/out/vsda.js
[18:20:47] [172.22.0.1][38e25c40][ManagementConnection] New connection established.
error=Invalid URL
[2023-10-20T18:20:47.363Z] error Failed to get latest version 
[18:20:47] [172.22.0.1][d561fb82][ExtensionHostConnection] New connection established.
[18:20:47] [172.22.0.1][d561fb82][ExtensionHostConnection] <647> Launched Extension Host Process.
github-actions[bot] commented 8 months ago

Thanks for opening your first issue here! Be sure to follow the relevant issue templates, or risk having this issue marked as invalid.

aptalca commented 8 months ago

Cuda support is done entirely through the nvidia runtime. nvidia-smi for instance is injected into the container by the runtime. I'm not sure if nvcc is also supposed to be injected or not. You may have to install that from a repo. But all that is beyond the scope of this image and the repo.

Try also setting the env vars for capabilities and gpu selection and see if that makes a difference.

Sebulba46 commented 8 months ago

Yeah, you are right, sorry for that. Basically, i found a way to install CUDA and cuDNN inside container. For those who whant to do this too:

Just follow this guide: https://github.com/ashutoshIITK/install_cuda_cudnn_ubuntu_20#uninstall-previous-versions

Make sure that nvidia-smi command is working, then just install "libxml2" package via running this command: sudo apt-get install libxml2

And follow steps of that guide.