[BUG] Unable to select NVIDIA GPU during Wizard install

harb88 commented 6 months ago

Is there an existing issue for this?

[X] I have searched the existing issues

Current Behavior

The GPU drop down lists only one GPU: /dev/dri/card1 - AMD GPU

Expected Behavior

The Install Wizard GPU drop down list should list both /dev/dri/card0 and /dev/dri/card1

Steps To Reproduce

Setup KASM using docker-compose on Unraid (see yaml below)
Go to Wizard page at https://192.168.x.x:3030
Accept EULA
Drop down only shows /dev/dri/card1 - AMD GPU

Environment

- OS: Unraid 6.12.6
- How docker service was installed: Preinstalled by Unraid

CPU architecture

x86-64

Docker creation

Tried a few different things in my compose yaml, with both the NVIDIA_VISIBLE_DEVICES=GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx and NVIDIA_VISIBLE_DEVICES=all.

services:
  kasm:
    container_name: kasm
    image: lscr.io/linuxserver/kasm:latest
    restart: unless-stopped
    privileged: true
    networks:
    - dev
    runtime: nvidia
    environment:
    - TZ=${TZ}
    - KASM_PORT=4043
    - NVIDIA_VISIBLE_DEVICES=GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
    - NVIDIA_DRIVER_CAPABILITIES=all
    volumes:
    - /mnt/primarycache/config/kasm/opt:/opt
    - ${ROOT}/config/kasm/profiles:/profiles
    ports:
    - 3030:3000
    - 4043:4043

Container logs

[migrations] started
[migrations] no migrations found
usermod: no changes
───────────────────────────────────────

      ██╗     ███████╗██╗ ██████╗
      ██║     ██╔════╝██║██╔═══██╗
      ██║     ███████╗██║██║   ██║
      ██║     ╚════██║██║██║   ██║
      ███████╗███████║██║╚██████╔╝
      ╚══════╝╚══════╝╚═╝ ╚═════╝

   Brought to you by linuxserver.io
───────────────────────────────────────

To support LSIO projects visit:
https://www.linuxserver.io/donate/

───────────────────────────────────────
GID/UID
───────────────────────────────────────

User UID:    911
User GID:    911
───────────────────────────────────────

[custom-init] No custom files found, skipping...
[ls.io-init] done.

github-actions[bot] commented 6 months ago

Thanks for opening your first issue here! Be sure to follow the relevant issue templates, or risk having this issue marked as invalid.

harb88 commented 6 months ago

I was able to work around this by doing the following:

Open the install Wizard page and accept the EULA
Open the Developer console in Firefox (or Chrome) and use the element selection tool to select the drop down.
Edit the value of the "/dev/dri/card1|AMD GPU" entry and change it to "/dev/dri/card0|NVIDIA GPU"
Select the AMD GPU in the drop down and complete the install Wizard

After this all the Workspaces had the correct Docker Run Config Override and Docker Exec Config however the card was still not being passed through until I logged into the container and ran the following followed by a restart: nvidia-ctk runtime configure --runtime=docker

After this my workspaces would use the NVIDIA GPU without issues.

Zbuddy13 commented 6 months ago

I am having a similar issue, I just ran the command for the runtime and it seems to make my gpu show up under the agent. Now whenever i try to launch chrome, i get a black screen. Setup is a 1660 super on unraid.

zabkaatlarge commented 6 months ago

Having the same issue as @Zbuddy13. I followed the instructions for editing the value from the drop down and then the config to add the runtime, but getting a black screen on the containers. I using Unraid with an RTX 3060 with a dummy plug, gpu works fine for other containers.

thelamer commented 5 months ago

So as it stands right now the Nvidia GPU support in Kasm is only useful in some very narrow situations. It leverages a virtualgl wrapper which stuff like chormium/chrome no longer supports. The images by default turn off all compositing as well. Not saying this should not work, but I am limited as to how I can test I have really only tested the Nvidia passthrough on a Debian (bookworm/focal/Jammy) host with a similar stack to what is inside this Jammy based DinD shim. I suspect there is a compatibility problem here with running multi layer nvidia container runtime.

I also understand that most server users have Nvidia GPUs as they have been the standard for some time, but the DRI3 acceleration support with AMD/Intel is way better from a compatibility standpoint https://kasmweb.com/kasmvnc/docs/master/gpu_acceleration.html . If you happen to have an Intel or AMD iGPU you will probably see more of a difference using that vs the current Nvidia implementation.

medivh-jay commented 5 months ago

I also have this problem. I have a P4 graphics card on unraid, but I can't select it in kasm. It works normally in other containers such as jellyfin.

medivh-jay commented 5 months ago

I tried to specify the daemon.json content inside the docker-kasm container to use nvidia's runtime, and it worked normally, but the kasm-guac container kept restarting.

LinuxServer-CI commented 4 months ago

This issue has been automatically marked as stale because it has not had recent activity. This might be due to missing feedback from OP. It will be closed if no further activity occurs. Thank you for your contributions.

draco1544 commented 4 months ago

After this all the Workspaces had the correct Docker Run Config Override and Docker Exec Config however the card was still not being passed through until I logged into the container and ran the following followed by a restart: nvidia-ctk runtime configure --runtime=docker

After this my workspaces would use the NVIDIA GPU without issues.

This is the solution

kkoshelev commented 4 months ago

The solution is mentioned above, I run the following commands (replace "kasm" with your container name):

docker exec -ti kasm nvidia-ctk runtime configure --runtime=docker
docker restart kasm

mfoti commented 3 months ago

I've the error:

error gathering device information while adding custom device "/dev/dri/renderD129": no such file or directory

I'm on UnRaid with default app installation and a manual wizard mod to recognize the card

full log:

host: fe5d658a8112
ingest_date: 202403202250
application: kasm_api
levelname: ERROR
kasm_user_name: admin@kasm.local
process: client_api_server
client_ip: 10.1.10.83
user_agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36
message
Error during Create request for Server(a89aa3ec-ede1-4152-8a43-1dc99cb1950b) : (Exception creating Kasm: Traceback (most recent call last):
  File "docker/api/client.py", line 268, in _raise_for_status
  File "requests/models.py", line 1021, in raise_for_status
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: http+docker://localhost/v1.44/containers/a0e4e572594e2180e265bba712357572473d55a0b88c9febd8b2b4a58088f650/start

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "__init__.py", line 573, in post
  File "provision.py", line 1871, in provision
  File "provision.py", line 1863, in provision
  File "docker/models/containers.py", line 818, in run
  File "docker/models/containers.py", line 404, in start
  File "docker/utils/decorators.py", line 19, in wrapped
  File "docker/api/container.py", line 1111, in start
  File "docker/api/client.py", line 270, in _raise_for_status
  File "docker/errors.py", line 31, in create_api_error_from_http_exception
docker.errors.APIError: 500 Server Error for http+docker://localhost/v1.44/containers/a0e4e572594e2180e265bba712357572473d55a0b88c9febd8b2b4a58088f650/start: Internal Server Error ("error gathering device information while adding custom device "/dev/dri/renderD129": no such file or directory")
)

mfoti commented 3 months ago

I've fixed changing the Chrome "Docker Run Config Override (JSON)" Workspace configuration with this:

{
  "device_requests": [
    {
      "capabilities": [
        [
          "gpu"
        ]
      ],
      "count": -1,
      "device_ids": null,
      "driver": "",
      "options": {}
    }
  ],
  "devices": [
    "/dev/dri/card1:/dev/dri/card1:rwm",
    "/dev/dri/renderD128:/dev/dri/renderD128:rwm"
  ],
  "environment": {
    "KASM_EGL_CARD": "/dev/dri/card1",
    "KASM_RENDERD": "/dev/dri/renderD128"
  },
  "hostname": "kasm"
}

but I have a black screen

fekhoo commented 2 months ago

If anyone still looking for the answer, here is the steps I used to get it to work 1 - I used [harb88] post to get the card to show during initial setup. " I am using Unraid, so make sure to select the right card" - Open the install Wizard page and accept the EULA, Open the Developer console in Firefox (or Chrome) and use the element selection tool to select the drop down. Edit the value of the "/dev/dri/card0" entry and change it to "/dev/dri/card1" select the right card and complete the install Wizard. 2 - login to the kasm docker and run "nvidia-ctk runtime configure --runtime=docker" 3 - open unriad terminal and run docker exec -ti kasm nvidia-ctk runtime configure --runtime=docker docker restart kasm at that point you will be able to see the gpu available under the Docker Agents in kasm. but if you add it to any workspace, you will just end up with a black screen. 4 - add this under Docker Run Config Override, and add the gpu to the workspace { "environment": { "NVIDIA_DRIVER_CAPABILITIES": "all", "XDG_RUNTIME_DIR": "/profiles/xgd" }, "hostname": "kasm" }

jmizell commented 2 months ago

Confirming same issue, host with two rtx 8000s, running Debian 12.5, with NVIDIA container runtime. I had to override the installer with /dev/dri/card0 and nvidia-ctk runtime configure --runtime=docker to enable gpu on the agent. I used the same docker run override config with NVIDIA_DRIVER_CAPABILITIES to get a workspace to run with both gpus.

twestelynck commented 2 weeks ago

I am encountering the same issue with the NVIDIA P400. The driver and container runtime are installed on the host. The nvidia-smi command works, and hardware acceleration for another Plex container on the same host is functioning.

During the initial setup of Kasm deployment, no NVIDIA GPU appeared. I used the trick of selecting it via the Developer console. When Kasm was deploying, I used the command docker exec -ti kasm nvidia-ctk runtime configure --runtime=docker and then restarted the container.

Once done, the GPU was displayed on Docker agents with all usage metrics (GPU/memory utilization and temperature).

I have a Brave browser workspace where I :

add the GPU
append the following environment variables into the Docker Run Config Override section: { "environment": { "NVIDIA_DRIVER_CAPABILITIES": "all", "XDG_RUNTIME_DIR": "/profiles/xgd" } }
Replace /dev/dri/card0 with /dev/dri/card1 and /dev/dri/renderD129 with /dev/dri/renderD128, which correspond to the NVIDIA GPU

Despite this, when I start the workspace, I still get a black screen and nvidia-smi does not display process usage (as it does, for example, when I start Plex transcoding).

Is there another step missing ?

linuxserver / docker-kasm