Closed kmittman closed 11 months ago
Hi @kmittman,
thanks for the info. I'll push a fix immediately. I forgot I had nvidia-modprobe
installed and did not notice it. Luckily at work we were still on 535.
Now that it comes to my mind, the udev rules were to avoid installing the nvidia-modprobe
utility (for example at work we need to keep track of audit events generated by suid binaries).
Current situation:
grep nvidia-frontend
to grep nvidia$
.nvidia-modprobe
is installed and the udev rules have the device creation steps removed, all devices are created including the nvidia-caps/nvidia-cap*
device files, which is not happening currently.nvidia-modprobe
is used in place of the udev rules, the binary is called only when the DDX is loaded, which means the console framebuffer (fbdev=1
) is not created until the DDX is loaded through X. GDM on Fedora starts already with Wayland, so the frame buffer console is not available until an X session is started, which might never happen, especially in headless systems without X.Basically there is no one case that fits all.
My suggestion would still be to keep the udev rules in place of nvidia-modprobe
and expanding them for the nvidia-caps
device files, so all cases are covered (no setuid binaries, frame buffer available at login even when no DDX is loaded, etc.).
For now I've pushed the fix to the udev rule, but I'm making a test if we can have the nvidia-modprobe
binary called as part of the udev rule once instead of those shell lines with all the escape characters, something similar to: https://github.com/negativo17/displaylink/blob/a27e894152dd19ddb072c21a5eaede59e250c82f/99-displaylink.rules
Calling nvidia-modprobe
directly from the udev rules would fix everything except having the setuid binary on the system, which probably is not an issue for many.
Actually it's not the DDX but the userspace "components" as listed in the documentation (last time I checked was a long time ago):
$ for i in *.so*; do echo "== $i"; strings $i | grep ^/usr/bin/nvidia-modprobe; done
== libcudadebugger.so.545.29.02
== libcuda.so.545.29.02
/usr/bin/nvidia-modprobe
== libEGL_nvidia.so.545.29.02
== libGLESv1_CM_nvidia.so.545.29.02
== libGLESv2_nvidia.so.545.29.02
== libGLX_nvidia.so.545.29.02
== libglxserver_nvidia.so.545.29.02
== libnvcuvid.so.545.29.02
/usr/bin/nvidia-modprobe
== libnvidia-allocator.so.545.29.02
/usr/bin/nvidia-modprobe
== libnvidia-api.so.1
/usr/bin/nvidia-modprobe
== libnvidia-cfg.so.545.29.02
== libnvidia-eglcore.so.545.29.02
== libnvidia-encode.so.545.29.02
== libnvidia-fbc.so.545.29.02
== libnvidia-glcore.so.545.29.02
/usr/bin/nvidia-modprobe
== libnvidia-glsi.so.545.29.02
== libnvidia-glvkspirv.so.545.29.02
== libnvidia-gpucomp.so.545.29.02
== libnvidia-ml.so.545.29.02
== libnvidia-ngx.so.545.29.02
== libnvidia-nvvm.so.545.29.02
== libnvidia-opencl.so.545.29.02
/usr/bin/nvidia-modprobe
== libnvidia-opticalflow.so.545.29.02
== libnvidia-pkcs11-openssl3.so.545.29.02
== libnvidia-pkcs11.so.545.29.02
== libnvidia-ptxjitcompiler.so.545.29.02
== libnvidia-rtcore.so.545.29.02
== libnvidia-tls.so.545.29.02
== libnvoptix.so.545.29.02
== libvdpau_nvidia.so.545.29.02
/usr/bin/nvidia-modprobe
== nvidia_drv.so
Anyway, I came up with this very simple fix which covers all cases:
https://github.com/negativo17/nvidia-kmod-common/commit/65a78725942f0115a9e4a9a4295ddd0aa1b5c2b5
As soon as the nvidia
kernel module is loaded (the main dependency for the other modules), nvidia-modprobe
is run and takes care of the basic device files, without any extra udev rule and regardless of case (headless/wayland or X):
$ ls -al /dev/nvidia*
crw-rw-rw-. 1 root root 195, 0 Nov 4 14:07 /dev/nvidia0
crw-rw-rw-. 1 root root 195, 255 Nov 4 14:07 /dev/nvidiactl
crw-rw-rw-. 1 root root 195, 254 Nov 4 14:07 /dev/nvidia-modeset
crw-rw-rw-. 1 root root 235, 0 Nov 4 14:07 /dev/nvidia-uvm
crw-rw-rw-. 1 root root 235, 1 Nov 4 14:07 /dev/nvidia-uvm-tools
Once a user session starts (both wayland and X), another one of the library above calls nvidia-modprobe
again and the Nvidia capabilities files are created correctly:
$ ls -al /dev/nvidia*
crw-rw-rw-. 1 root root 195, 0 Nov 4 14:07 /dev/nvidia0
crw-rw-rw-. 1 root root 195, 255 Nov 4 14:07 /dev/nvidiactl
crw-rw-rw-. 1 root root 195, 254 Nov 4 14:07 /dev/nvidia-modeset
crw-rw-rw-. 1 root root 235, 0 Nov 4 14:07 /dev/nvidia-uvm
crw-rw-rw-. 1 root root 235, 1 Nov 4 14:07 /dev/nvidia-uvm-tools
/dev/nvidia-caps:
total 0
drwxr-xr-x. 2 root root 80 Nov 4 14:14 .
drwxr-xr-x. 24 root root 4880 Nov 4 14:14 ..
cr--------. 1 root root 238, 1 Nov 4 14:14 nvidia-cap1
cr--r--r--. 1 root root 238, 2 Nov 4 14:14 nvidia-cap2
This is also future proof in case Nvidia changes names, kernel modules, adds additional device files, etc.
The drawback is the installed binary with setuid privileges, but I would say it's a good compromise.
Should -3 release for this change be available at this time?
The drawback is the installed binary with setuid privileges
@scaronni could you detail what potential drawback you have in mind having this binary setuid ?
Beside the fact that you want to avoid any setuid binary on a system to reduce the attack surface, the NVIDIA documentation for the driver says that distributions should rely on their own udev rules / packaging mechanism to achieve what nvidia-modprobe
does, but it's impossible to cover all use cases (IMEX, changes between drivers, Wayland/X differences, multi card setup, etc.).
Thank you for the answer. For the security aspect of it, what you are saying is that in addition to have a setuid binary on the system, the new way is relying on it at startup... BTW even if nvidia-modprobe
was not used by udev at startup, it would still be available as a setuid command for any user on the system...
Before this change, nvidia-modprobe
was not installed by default with my packaged drivers as there was no need for it, so it was not available for anyone.
Greetings @scaronni,
FYI, in NVIDIA driver branch 545 an entry in
/proc/devices
has been renamed fromnvidia-frontend
tonvidia
Referenced in 3 lines of60-nvidia.rules
https://github.com/negativo17/nvidia-kmod-common/blob/master/60-nvidia.rulesSee: https://github.com/NVIDIA/egl-wayland/issues/72#issuecomment-1792908365