negativo17 / nvidia-kmod-common

NVIDIA's proprietary driver kernel module common files
4 stars 5 forks source link

udev device renamed from nvidia-frontend to nvidia #11

Closed kmittman closed 11 months ago

kmittman commented 11 months ago

Greetings @scaronni,

FYI, in NVIDIA driver branch 545 an entry in /proc/devices has been renamed from nvidia-frontend to nvidia Referenced in 3 lines of 60-nvidia.rules https://github.com/negativo17/nvidia-kmod-common/blob/master/60-nvidia.rules

See: https://github.com/NVIDIA/egl-wayland/issues/72#issuecomment-1792908365

scaronni commented 11 months ago

Hi @kmittman,

thanks for the info. I'll push a fix immediately. I forgot I had nvidia-modprobe installed and did not notice it. Luckily at work we were still on 535.

Now that it comes to my mind, the udev rules were to avoid installing the nvidia-modprobe utility (for example at work we need to keep track of audit events generated by suid binaries).

Current situation:

Basically there is no one case that fits all.

My suggestion would still be to keep the udev rules in place of nvidia-modprobe and expanding them for the nvidia-caps device files, so all cases are covered (no setuid binaries, frame buffer available at login even when no DDX is loaded, etc.).

For now I've pushed the fix to the udev rule, but I'm making a test if we can have the nvidia-modprobe binary called as part of the udev rule once instead of those shell lines with all the escape characters, something similar to: https://github.com/negativo17/displaylink/blob/a27e894152dd19ddb072c21a5eaede59e250c82f/99-displaylink.rules

scaronni commented 11 months ago

Calling nvidia-modprobe directly from the udev rules would fix everything except having the setuid binary on the system, which probably is not an issue for many.

scaronni commented 11 months ago

Actually it's not the DDX but the userspace "components" as listed in the documentation (last time I checked was a long time ago):

$ for i in *.so*; do echo "== $i"; strings $i | grep ^/usr/bin/nvidia-modprobe; done
== libcudadebugger.so.545.29.02
== libcuda.so.545.29.02
/usr/bin/nvidia-modprobe
== libEGL_nvidia.so.545.29.02
== libGLESv1_CM_nvidia.so.545.29.02
== libGLESv2_nvidia.so.545.29.02
== libGLX_nvidia.so.545.29.02
== libglxserver_nvidia.so.545.29.02
== libnvcuvid.so.545.29.02
/usr/bin/nvidia-modprobe
== libnvidia-allocator.so.545.29.02
/usr/bin/nvidia-modprobe
== libnvidia-api.so.1
/usr/bin/nvidia-modprobe
== libnvidia-cfg.so.545.29.02
== libnvidia-eglcore.so.545.29.02
== libnvidia-encode.so.545.29.02
== libnvidia-fbc.so.545.29.02
== libnvidia-glcore.so.545.29.02
/usr/bin/nvidia-modprobe
== libnvidia-glsi.so.545.29.02
== libnvidia-glvkspirv.so.545.29.02
== libnvidia-gpucomp.so.545.29.02
== libnvidia-ml.so.545.29.02
== libnvidia-ngx.so.545.29.02
== libnvidia-nvvm.so.545.29.02
== libnvidia-opencl.so.545.29.02
/usr/bin/nvidia-modprobe
== libnvidia-opticalflow.so.545.29.02
== libnvidia-pkcs11-openssl3.so.545.29.02
== libnvidia-pkcs11.so.545.29.02
== libnvidia-ptxjitcompiler.so.545.29.02
== libnvidia-rtcore.so.545.29.02
== libnvidia-tls.so.545.29.02
== libnvoptix.so.545.29.02
== libvdpau_nvidia.so.545.29.02
/usr/bin/nvidia-modprobe
== nvidia_drv.so

Anyway, I came up with this very simple fix which covers all cases:

https://github.com/negativo17/nvidia-kmod-common/commit/65a78725942f0115a9e4a9a4295ddd0aa1b5c2b5

As soon as the nvidia kernel module is loaded (the main dependency for the other modules), nvidia-modprobe is run and takes care of the basic device files, without any extra udev rule and regardless of case (headless/wayland or X):

$ ls -al /dev/nvidia*
crw-rw-rw-. 1 root root 195,   0 Nov  4 14:07 /dev/nvidia0
crw-rw-rw-. 1 root root 195, 255 Nov  4 14:07 /dev/nvidiactl
crw-rw-rw-. 1 root root 195, 254 Nov  4 14:07 /dev/nvidia-modeset
crw-rw-rw-. 1 root root 235,   0 Nov  4 14:07 /dev/nvidia-uvm
crw-rw-rw-. 1 root root 235,   1 Nov  4 14:07 /dev/nvidia-uvm-tools

Once a user session starts (both wayland and X), another one of the library above calls nvidia-modprobe again and the Nvidia capabilities files are created correctly:

$ ls -al /dev/nvidia*
crw-rw-rw-. 1 root root 195,   0 Nov  4 14:07 /dev/nvidia0
crw-rw-rw-. 1 root root 195, 255 Nov  4 14:07 /dev/nvidiactl
crw-rw-rw-. 1 root root 195, 254 Nov  4 14:07 /dev/nvidia-modeset
crw-rw-rw-. 1 root root 235,   0 Nov  4 14:07 /dev/nvidia-uvm
crw-rw-rw-. 1 root root 235,   1 Nov  4 14:07 /dev/nvidia-uvm-tools

/dev/nvidia-caps:
total 0
drwxr-xr-x.  2 root root     80 Nov  4 14:14 .
drwxr-xr-x. 24 root root   4880 Nov  4 14:14 ..
cr--------.  1 root root 238, 1 Nov  4 14:14 nvidia-cap1
cr--r--r--.  1 root root 238, 2 Nov  4 14:14 nvidia-cap2
scaronni commented 11 months ago

This is also future proof in case Nvidia changes names, kernel modules, adds additional device files, etc.

The drawback is the installed binary with setuid privileges, but I would say it's a good compromise.

Xerkus commented 11 months ago

Should -3 release for this change be available at this time?

caius-martinus commented 4 months ago

The drawback is the installed binary with setuid privileges

@scaronni could you detail what potential drawback you have in mind having this binary setuid ?

scaronni commented 4 months ago

Beside the fact that you want to avoid any setuid binary on a system to reduce the attack surface, the NVIDIA documentation for the driver says that distributions should rely on their own udev rules / packaging mechanism to achieve what nvidia-modprobe does, but it's impossible to cover all use cases (IMEX, changes between drivers, Wayland/X differences, multi card setup, etc.).

caius-martinus commented 4 months ago

Thank you for the answer. For the security aspect of it, what you are saying is that in addition to have a setuid binary on the system, the new way is relying on it at startup... BTW even if nvidia-modprobe was not used by udev at startup, it would still be available as a setuid command for any user on the system...

scaronni commented 4 months ago

Before this change, nvidia-modprobe was not installed by default with my packaged drivers as there was no need for it, so it was not available for anyone.