Open ffund opened 6 months ago
node.sudo('systemctl isolate multi-user')
node.run('lsof -V /dev/nvidia*')
node.sudo('modprobe -rf nvidia_uvm nvidia_drm nvidia_modeset nvidia-vgpu-vfio nvidia')
node.sudo('modprobe nvidia NVreg_RestrictProfilingToAdminUsers=0')
node.sudo('systemctl isolate graphical')
After rebooting, the Permission error persists.
Also Tried: Permanent access to a user
node.run('echo "options nvidia NVreg_RestrictProfilingToAdminUsers=0" | sudo tee -a /etc/modprobe.d/nvidia-graphics-drivers-kms.conf >/dev/null')
node.sudo('update-initramfs -u -k all')
Again, the issue remains.
Using nvidia-smi however resolves the issue. However, it is not a python package but a sudo command.
node.sudo('nvidia-smi --power-limit 260')
Tests on a GPU instance -
# stop window manager
sudo systemctl isolate multi-user
# to avoid error w/ unloading kernel modules, had to disable persistence mode
sudo nvidia-smi -pm 0
# unload some kernel modules
sudo modprobe -r nvidia_drm
sudo modprobe -r nvidia_uvm
sudo modprobe -r nvidia_modeset
sudo modprobe -r nvidia
sudo modprobe -r i2c_nvidia_gpu # why not
# use "lsmod | grep nvidia" to confirm
# now we can change permission setting -
sudo modprobe nvidia NVreg_RestrictProfilingToAdminUsers=0
# restart window manager
sudo systemctl isolate graphical
# re-enable persistence mode, whatever that is
sudo nvidia-smi -pm 1
Should now be set to 0 -
cat /proc/driver/nvidia/params | grep "RmProfilingAdminOnly"
but
nvidia-smi --power-limit 260
still raises a permission error.
Refer to this section of README.
We will use the "Linux Desktop > Enable temporarily" instructions in the NVIDIA docs to set up permissions at the top of the "reproduce" notebook.