selkies-project / docker-nvidia-glx-desktop

KDE Plasma Desktop container designed for Kubernetes supporting OpenGL GLX and Vulkan for NVIDIA GPUs with WebRTC and HTML5, providing an open-source remote cloud graphics or game streaming platform. Spawns its own fully isolated X Server instead of using the host X server, not requiring /tmp/.X11-unix host sockets or host configuration.
https://github.com/selkies-project/docker-nvidia-glx-desktop/pkgs/container/nvidia-glx-desktop
Mozilla Public License 2.0
263 stars 59 forks source link

XDG_RUNTIME_DIR not set, + vulkan issues #27

Closed ayunami2000 closed 1 year ago

ayunami2000 commented 1 year ago

1) XDG_RUNTIME_DIR is not set. It can be set manually by passing -e XDG_RUNTIME_DIR=/tmp 2) vkcube and vulkaninfo both fail: image image 3) SteamVR fails, probably due to the above ^ issue with Vulkan, but I'm not sure.

GPU is a 3090 on RunPod. Similar issues on Vast.ai.

ayunami2000 commented 1 year ago

Notice: The following was unsuccessful, so do not think it's that one fix you've been looking for, because it does not work:

Vulkan issues were fixed after installing aptitude then running sudo aptitude install nvidia-driver-515=515.48.07-0ubuntu1 libnvidia-gl-515=515.48.07-0ubuntu1 nvidia-dkms-515=515.48.07-0ubuntu1 libnvidia-compute-515=515.48.07-0ubuntu1 libnvidia-extra-515=515.48.07-0ubuntu1 nvidia-compute-utils-515=515.48.07-0ubuntu1 libnvidia-decode-515=515.48.07-0ubuntu1 libnvidia-encode-515=515.48.07-0ubuntu1 nvidia-utils-515=515.48.07-0ubuntu1 xserver-xorg-video-nvidia-515=515.48.07-0ubuntu1 libnvidia-cfg1-515=515.48.07-0ubuntu1 libnvidia-fbc1-515=515.48.07-0ubuntu1 libnvidia-common-515=515.48.07-0ubuntu1 nvidia-kernel-common-515=515.48.07-0ubuntu1

Edit: never mind, now apt hates me Edit 2: I was able to fix apt being mad at me by editing /var/lib/dpkg/status and removing the broken packages from things that depend on them Edit 3: turns out it's using llvmpipe, and vulkan is being held up by mesa-vulkan-drivers ://////

ehfd commented 1 year ago

I use Vulkan a lot of times provided everything was configured in the host properly.

  1. So far, I had no issues without XDG_RUNTIME_DIR set. This is not the core issue.
  2. You need to use nvidia-docker2 or NVIDIA/k8s-device-plugin for this to properly work. Please confirm you have this applied in the host, outside the container.
  3. sudo ubuntu-drivers autoinstall is one way to install NVIDIA drivers properly.
ayunami2000 commented 1 year ago

Apparently, applications are supposed to handle when XDG_RUNTIME_DIR is not set, but ALVR crashes (lol) (1). OpenGL works (glxgears) and nvidia-smi works as well. I'm not sure about how they are running it with the GPU (2), but I'll give sudo ubuntu-drivers autoinstall a try (3).

ayunami2000 commented 1 year ago

Ok, I've run sudo ubuntu-drivers autoinstall, and I got this. image image image

ayunami2000 commented 1 year ago

Update: vulkaninfo works when run like DISPLAY= vulkaninfo (unsetting DISPLAY

Update (this had better not be it): Possibility that I had used the other repo (the egl one) instead. Checking now to be sure.

Edit: Nope, both the egl one and the glx one both act similarly with vulkan. (neither work)

Update: the 18.04 glx one's smoketest just gives me: image BUT vulkaninfo on 18.04 WORKS it WORKS OMG

And, final edit for now: Edit: on 18.04 vulkan still doesnt work. i DID dump vulkaninfo output if you want it. lmk

ehfd commented 1 year ago

Ok, I've run sudo ubuntu-drivers autoinstall, and I got this.

I meant outside the container in the host. The host has to be properly configured. Consult this: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html

ayunami2000 commented 1 year ago

image

The host meets the requirements. It's very likely something wrong with the docker image, not the host. (but I could be wrong)

ehfd commented 1 year ago

DO NOT install your own drivers. The startup script is supposed to install it for you. In a clean, with no modifications, fresh nvidia-glx-desktop container, after startup get the following outputs: glxinfo | head glxgears -info (get the first 50 lines) nvidia-smi Let's start from here. Oh and please come to https://discord.com/invite/wDNGDeSW5F as this might require long troubleshooting.

ehfd commented 1 year ago

https://github.com/NVIDIA/nvidia-container-toolkit/issues/140 This is the issue. In NVIDIA_DRIVER_CAPABILITIES compute,utility,video,graphics,display must all be there or simply use all. display is missing in this case.

ehfd commented 1 year ago

To include in documentation.

ehfd commented 1 year ago

Added to Documentation.