Closed Zanathoz closed 1 year ago
Hi,
I'm far from an expert on this, but I don't think its fully necessary to have vulkaninfo
available inside the container. One of the purposes of the nvidia container runtime/toolkit is to map in the drivers for you automatically. For what its worth, in this steam example, you'll see we mapped in the vulkan configs, relaxed security settings on the container and asked NVIDIA to expose more capabilities to the container. We were able to get tombraider running under vulkan
https://www.reddit.com/r/kasmweb/comments/zvee3q/nvidia_gpu_with_steam_workspace/
Here is another example that may be helpful. This was a test to get the beta version of GODOT running which required vulkan. In this case the vulkan sdk was needed so we built a custom image that based from the official vulkan images. https://github.com/kasmtech/workspaces-issues/issues/264#issuecomment-1259476362
Consider linking this in reddit to see if others have more to contribute to the convo.
I can post to reddit, but this issue is easily recreatable with your own retroarch image if a compatible card is available. If you change the retroarch driver to use Vulkan in the retroarch configuration, it will go into an infinite loading loop and will never actually load until you change the configuration back to "gl". As I confirmed in my example above, the Nvidia driver is presented to the container already, and I have the required vulkan dependencies installed on the host, and confirmed the GT 710 is Vulkan compatible: https://www.khronos.org/conformance/adopters/conformant-products#vulkan
I did find an issue with Vulkan passthrough to my containers that is now resolved, but the issue with the retroarch container still remains.
Vulkan Error:
kasm@dj-kasmws:~$ sudo docker run --gpus all \
-e NVIDIA_DISABLE_REQUIRE=1 \
-e NVIDIA_DRIVER_CAPABILITIES=all --device /dev/dri \
-v /etc/vulkan/icd.d/nvidia_icd.json:/etc/vulkan/icd.d/nvidia_icd.json \
-v /etc/vulkan/implicit_layer.d/nvidia_layers.json:/etc/vulkan/implicit_layer.d/nvidia_layers.json \
-v /usr/share/glvnd/egl_vendor.d/10_nvidia.json:/usr/share/glvnd/egl_vendor.d/10_nvidia.json \
-it nvidia/vulkan:1.3-470 \
bash
root@ff8e3fd9b902:/# vulkaninfo
Cannot create Vulkan instance.
This problem is often caused by a faulty installation of the Vulkan driver or attempting to use a GPU that does not support Vulkan.
ERROR at /vulkan-sdk/1.3.204.1/source/Vulkan-Tools/vulkaninfo/vulkaninfo.h:649:vkCreateInstance failed with ERROR_INCOMPATIBLE_DRIVER
The fix was found here - I had to remove a directory file mistakenly made by driver install, and make it a file with the following contents. Changed permission to +x afterwards: https://forums.developer.nvidia.com/t/vulkan-not-working-solved/220255
{
"file_format_version" : "1.0.0",
"ICD": {
"library_path": "libGLX_nvidia.so.0",
"api_version" : "1.3.204"
}
}
After fixing this, I can run the Nvidia Vulkan container and get the correct vulkaninfo output:
sudo docker run --gpus all \
-e NVIDIA_DISABLE_REQUIRE=1 \
-v $HOME/.Xauthority:/root/.Xauthority \
-e DISPLAY -e NVIDIA_DRIVER_CAPABILITIES=all --device /dev/dri --net host \
-v /etc/vulkan/icd.d/nvidia_icd.json:/etc/vulkan/icd.d/nvidia_icd.json \
-v /etc/vulkan/implicit_layer.d/nvidia_layers.json:/etc/vulkan/implicit_layer.d/nvidia_layers.json \
-v /usr/share/glvnd/egl_vendor.d/10_nvidia.json:/usr/share/glvnd/egl_vendor.d/10_nvidia.json \
-it nvidia/vulkan:1.3-470 \
bash
root@50fa2a13315c:/# vulkaninfo
'DISPLAY' environment variable not set... skipping surface info
error: XDG_RUNTIME_DIR not set in the environment.
==========
VULKANINFO
==========
Vulkan Instance Version: 1.3.204
Instance Extensions: count = 18
===============================
(extra lines omitted)
If I install vulkan-tools and run vulkaninfo inside the retroarch container, vulkaninfo gives the following error and I'm not sure why as my google-fu has reached it's limits this morning. I think it is due to potential driver issue as I see the container is using a MESA driver for video and not nvidia, but nvidia-smi is still working in the container:
default:~$ vulkaninfo
WARNING: [Loader Message] Code 0 : terminator_CreateInstance: Failed to CreateInstance in ICD 1. Skipping ICD.
WARNING: [Loader Message] Code 0 : terminator_CreateInstance: Failed to CreateInstance in ICD 2. Skipping ICD.
error: XDG_RUNTIME_DIR not set in the environment.
error: XDG_RUNTIME_DIR not set in the environment.
error: XDG_RUNTIME_DIR not set in the environment.
error: XDG_RUNTIME_DIR not set in the environment.
error: XDG_RUNTIME_DIR not set in the environment.
/build/vulkan-tools-KEbD_A/vulkan-tools-1.2.131.1+dfsg1/vulkaninfo/vulkaninfo.h:926: failed with ERROR_UNKNOWN
default:~$ nvidia-smi
Wed Apr 5 14:32:06 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.161.03 Driver Version: 470.161.03 CUDA Version: N/A |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:0B:00.0 N/A | N/A |
| 50% 46C P0 N/A / N/A | 113MiB / 2002MiB | N/A Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
default:~$ glxinfo -B
name of display: :1
display: :1 screen: 0
direct rendering: Yes
Extended renderer info (GLX_MESA_query_renderer):
Vendor: Mesa/X.org (0xffffffff)
Device: llvmpipe (LLVM 15.0.7, 256 bits) (0xffffffff)
Version: 22.3.7
Accelerated: no
Video memory: 11967MB
Unified memory: yes
Preferred profile: core (0x1)
Max core profile version: 4.5
Max compat profile version: 4.5
Max GLES1 profile version: 1.1
Max GLES[23] profile version: 3.2
OpenGL vendor string: Mesa/X.org
OpenGL renderer string: llvmpipe (LLVM 15.0.7, 256 bits)
OpenGL core profile version string: 4.5 (Core Profile) Mesa 22.3.7 - kisak-mesa PPA
OpenGL core profile shading language version string: 4.50
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL version string: 4.5 (Compatibility Profile) Mesa 22.3.7 - kisak-mesa PPA
OpenGL shading language version string: 4.50
OpenGL context flags: (none)
OpenGL profile mask: compatibility profile
OpenGL ES profile version string: OpenGL ES 3.2 Mesa 22.3.7 - kisak-mesa PPA
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20
default:~$
I was initially thinking a display needed passed through to the container, but the vulkaninfo on the official vulkan container from Nvidia shows the same XDG_RUNTIME_DIR error at the top of it's output.
I think perhaps there is another variable that needs passed to the containers on creation, but I'm not sure what else to try here.
Here is another post I found with a similar issue for Vulkan, claiming a display is not present for the container, although again I don't think this is an issue: https://github.com/NVIDIA/nvidia-container-toolkit/issues/140
I am also tracking this issue on Reddit here - https://www.reddit.com/r/kasmweb/comments/12cifwe/comment/jf28x34/?context=3
I was able to get my container to recognize my video card and render using it properly by re-deploying a desktop distribution of Ubuntu 22, but selecting the Vulkan driver in Retroarch still leads to an endless boot loop with the same errors I've posted above from within the container.
Thanks to Justin over on the subreddit, this is resolved. Adding these items to the Workspace got the Vulkan driver working.
Volume Mapping:
{
"/usr/share/vulkan/icd.d/nvidia_icd.json": {
"bind": "/etc/vulkan/icd.d/nvidia_icd.json",
"mode": "ro",
"uid": 1000,
"gid": 1000,
"required": true,
"skip_check": true
},
"/usr/share/vulkan/implicit_layer.d/nvidia_layers.json": {
"bind": "/etc/vulkan/implicit_layer.d/nvidia_layers.json",
"mode": "ro",
"uid": 1000,
"gid": 1000,
"required": true,
"skip_check": true
},
"/usr/share/glvnd/egl_vendor.d/10_nvidia.json": {
"bind": "/usr/share/glvnd/egl_vendor.d/10_nvidia.json",
"mode": "ro",
"uid": 1000,
"gid": 1000,
"required": true,
"skip_check": true
}
}
Docker Run Config Override
{
"shm_size": "1gb",
"security_opt": [
"seccomp=unconfined"
],
"privileged": true,
"environment": {
"NVIDIA_DISABLE_REQUIRE": "1",
"NVIDIA_DRIVER_CAPABILITIES": "all"
}
}
I'm setting up a RetroArch workspace and do not have Vulkan support being passed through to my workspace container. If I select Vulkan as the driver in retroarch and restart it from the menu, it will enter an infinite loop until I change the driver back to "gl" in the local config file.
I do have the Nvidia drivers, Nvidia Container runtime and Vulkan libraries on my Ubuntu 22 host and passed through to the container. I know games are utilizing it. I can run the nvidia-smi command from within a retroarch container once spun up as a workspace, but the vulkaninfo command is not available inside the container:
The Vulkan library is passed through to the container properly from the host:
Installing the vulkan-tools is not enough to get Vulkan working:
From what I've read, the Nvidia drivers also need installed inside the container, but I'm having an issue installing because the Nvidia driver is in use via the host passthrough:
nvidia-smi command working:
I will try and build my own retroarch image off your core image, but that would be a new venture for me and I will likely fumble around with it for a while. If someone can create a new dev image for me to test I'd appreciate it!
If I do get an image built and tested I will post results.