wvthoog / proxmox-vgpu-installer

106 stars 27 forks source link

Can't install unlock services #5

Open mkuznetsov opened 4 months ago

mkuznetsov commented 4 months ago

Proxmox 8.1 Nvidia Tesla P4 Installed last version on 16.x branch 535.161.08 (because 17.x don't support Pascal based cards) nvidia-smi runs and return normal output mdevctl types return empty string

script created: /etc/systemd/system/nvidia-vgpud.service.d/vgpu_unlock.conf /etc/systemd/system/nvidia-vgpu-mgr.service.d/vgpu_unlock.conf With same content: [CODE] [Service] Environment=LD_PRELOAD=/opt/vgpu_unlock-rs/target/release/libvgpu_unlock_rs.so [/CODE] failed to enable services: root@pve:/opt/vgpu_unlock-rs# systemctl enable nvidia-vgpud.service Failed to enable unit: Unit file nvidia-vgpud.service does not exist. root@pve:/opt/vgpu_unlock-rs# systemctl enable nvidia-vgpud-mgr.service Failed to enable unit: Unit file nvidia-vgpud-mgr.service does not exist.

It needed files to enable service, without them service don't start /etc/systemd/system/nvidia-vgpud.service /etc/systemd/system/nvidia-vgpu-mgr.service

Simple coping of config don't work, because no [Install] section. I add it and it become something like this : [CODE] [Service] Environment=LD_PRELOAD=/opt/vgpu_unlock-rs/target/release/libvgpu_unlock_rs.so

[Install] WantedBy=multi-user.target [/CODE] After that services formally was installed but not working

systemctl enable nvidia-vgpud.service Created symlink /etc/systemd/system/multi-user.target.wants/nvidia-vgpud.service → /etc/systemd/system/nvidia-vgpud.service. systemctl enable nvidia-vgpu-mgr.service Created symlink /etc/systemd/system/multi-user.target.wants/nvidia-vgpu-mgr.service → /etc/systemd/system/nvidia-vgpu-mgr.service.

Getting status on services return error: [CODE] systemctl status nvidia-vgpu-mgr Warning: The unit file, source configuration file or drop-ins of nvidia-vgpu-mgr.service changed on disk. Run 'systemctl daemon-reload' to reload units. ○ nvidia-vgpu-mgr.service Loaded: bad-setting (Reason: Unit nvidia-vgpu-mgr.service has a bad unit file setting.) Drop-In: /etc/systemd/system/nvidia-vgpu-mgr.service.d └─vgpu_unlock.conf Active: inactive (dead)

Apr 19 18:25:52 pve systemd[1]: nvidia-vgpu-mgr.service: Service has no ExecStart=, ExecStop=, or SuccessAction=. Refusing. [/CODE]

wvthoog commented 4 months ago

You don't need to install the unlock services since the Tesla P4 is supported natively. Download the new script and it will work (if I've done my research correctly)

mkuznetsov commented 4 months ago

In my case it don’t. All I get is empty list of vgpu and driver said that vgpu profiles are not found. My solution was downgrade of drivers. I started from 535.161.08 and ended up on 535.104.06. Last one work without patches or unlocker lib.

But the main issue is memory profiles. P4 have 7680 Mb memory. So out of box it provide only 8 x 512mb, 4 x 1gb , 2 x 2gb, 1x4gb vgpu profiles. Unlocker I use to override some memory separation and get 3.5gb and 4 gb profiles. I use smaller for gaming and bigger for ml.

On 28 Apr 2024, at 22:06, wvthoog @.***> wrote:

You don't need to install the unlock services since the Tesla P4 is supported natively. Download the new script and it will work (if I've done my research correctly)

— Reply to this email directly, view it on GitHub https://github.com/wvthoog/proxmox-vgpu-installer/issues/5#issuecomment-2081605069, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFOGVCG2VYFKLF2K42QMVLY7VCEJAVCNFSM6AAAAABGPNG2WSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOBRGYYDKMBWHE. You are receiving this because you authored the thread.

wvthoog commented 4 months ago

that's right, the P4 should be supported by one of the 16.x drivers. Couldn't tell from the Nvidia website which one (so you'd have to try that out yourself)

Incorporating the profile overrides was a consideration, but opted not to integrate it for version 1.1 of the script. Due to time constraints and the huge block of code it would add. Buit that wouldn't be of use to you because you're running the native driver without vgpu_unlock patches. There is one option though to do patch the driver and then add this to your TOML profile override: echo "unlock = false" > /etc/vgpu_unlock/config.toml

Than you can use the P4 natively and still use custom profiles i believe.

Maybe I'll add it in a later version of the script

mkuznetsov commented 4 months ago

I get a very strange results in llm runs. I choose B class vgpu. make overriding and add "cuda_enabled=1" to profile

in gta5 on one vgpu I get utilisation about 16-18% and on second vgpu can't get utilisation higher than 3-4% by llm models in ollama. Can vgpu unlock somehow affect computational abbilities?

second question - host driver 535.104 support cuda version 16.1 but vgpu provided only 12.2 to vm. can it be patch related?