teaching-on-testbeds / ml-energy

MIT License
0 stars 0 forks source link

pynvml command to set the power limit requires permissions #7

Open ffund opened 6 months ago

ffund commented 6 months ago

Refer to this section of README.

We will use the "Linux Desktop > Enable temporarily" instructions in the NVIDIA docs to set up permissions at the top of the "reproduce" notebook.

Deepak-Work commented 5 months ago
  1. None of the commands mentioned in the mentioned doc worked Tried: Temporary access to a user: Attempted from the reserve notebook to provide the permission
    node.sudo('systemctl isolate multi-user')
    node.run('lsof -V /dev/nvidia*')
    node.sudo('modprobe -rf nvidia_uvm nvidia_drm nvidia_modeset nvidia-vgpu-vfio nvidia')
    node.sudo('modprobe nvidia NVreg_RestrictProfilingToAdminUsers=0') 
    node.sudo('systemctl isolate graphical')

After rebooting, the Permission error persists.

Also Tried: Permanent access to a user

node.run('echo "options nvidia NVreg_RestrictProfilingToAdminUsers=0" | sudo tee -a /etc/modprobe.d/nvidia-graphics-drivers-kms.conf >/dev/null')
node.sudo('update-initramfs -u -k all')

Again, the issue remains.

Using nvidia-smi however resolves the issue. However, it is not a python package but a sudo command.

node.sudo('nvidia-smi --power-limit 260')
image
ffund commented 5 months ago

Tests on a GPU instance -

# stop window manager
sudo systemctl isolate multi-user
# to avoid error w/ unloading kernel modules, had to disable persistence mode
sudo nvidia-smi -pm 0
# unload some kernel modules
sudo modprobe -r nvidia_drm
sudo modprobe -r nvidia_uvm
sudo modprobe -r nvidia_modeset
sudo modprobe -r nvidia
sudo modprobe -r i2c_nvidia_gpu # why not 
# use "lsmod | grep nvidia" to confirm 
# now we can change permission setting - 
sudo modprobe nvidia NVreg_RestrictProfilingToAdminUsers=0
# restart window manager
sudo systemctl isolate graphical
# re-enable persistence mode, whatever that is
sudo nvidia-smi -pm 1

Should now be set to 0 -

cat /proc/driver/nvidia/params | grep "RmProfilingAdminOnly"

but

nvidia-smi --power-limit 260

still raises a permission error.