openSUSE / SUSEPrime

Provide nvidia-prime like package for openSUSE
64 stars 11 forks source link

Use manual remove for PCI device instead of Bbswitch? #74

Open bomiyr opened 2 years ago

bomiyr commented 2 years ago

Hi, It's me again :smile:
In this issue we found out that bbswitch is not working on my laptop. TLDR about my investigation is in this comment.

Finally I was able to disable discrete GPU with just echo 1 | sudo tee /sys/bus/pci/devices/0000\:01\:00.0/remove.

And I'm started thinking, that maybe this is the right way for disabling the device? In this case we don't need to have dependency on bbswitch (which seems to be almost dead, if you look at it's repo). But I don't have the right expertise on the topic to see the whole picture.

So what do you think, is it possible to integrate such solution into suse-prime itself, or there are some hidden caveats in such approach?

sndirsch commented 2 years ago

Thanks for the report. Indeed I could disable the NVIDIA GPU that way. :-)

linux:/home/tux # modprobe nvidia modprobe: ERROR: could not insert 'nvidia': No such device

linux:/home/tux # dmesg [ 5172.008516] nvidia-nvlink: Nvlink Core is being initialized, major device number 236 [ 5172.008832] NVRM: No NVIDIA GPU found. [ 5172.009523] nvidia-nvlink: Unregistered the Nvlink Core, major device number 236

Unfortunately I was not able to reenable it. :-( I've tried

linux:/home/tux # echo 1 | tee /sys/bus/pci/devices/0000\:01\:00.0/rescan tee: '/sys/bus/pci/devices/0000:01:00.0/rescan': No such file or directory 1

linux:/home/tux # echo 1 /sys/bus/pci/devices/0000\:01\:00.0/rescan 1 /sys/bus/pci/devices/0000:01:00.0/rescan

linux:/home/tux # modprobe nvidia modprobe: ERROR: could not insert 'nvidia': No such device linux:/home/tux # dmesg [ 5424.197577] nvidia-nvlink: Nvlink Core is being initialized, major device number 236 [ 5424.197879] NVRM: No NVIDIA GPU found. [ 5424.198315] nvidia-nvlink: Unregistered the Nvlink Core, major device number 236

EDIT: I'm using a Intel/nvidia combo. NVIDIA is not yet Turing.

bomiyr commented 2 years ago

I was able to reenable card with ’echo 1 | sudo tee /sys/bus/pci/rescan’

sndirsch commented 2 years ago

OMG. ;-) Indeed now it works for me. :-) Even easier.

disable nVidia GPU modprobe -r nvidia_drm nvidia_modeset nvidia_uvm nvidia echo 1 > /sys/bus/pci/devices/0000\:00\:01/remove

reenable nVidia GPU echo 1 > /sys/bus/pci/rescan

sndirsch commented 2 years ago

Adding our bbswitch expert. @simopil What do you think? Should we try to get rid of bbswitch? Looks like it would be possible.

bomiyr commented 2 years ago

The only problem I see is that rescan command is not exclusive to GPU and will add any previously removed PCI device. And vice versa, calling rescan by user or by some hardware-info app will silently enable GPU...

sndirsch commented 2 years ago

Yeah. Valid points. And there might be more reasons, why bbswitch kernel module has been written ...

simopil commented 2 years ago

I tried removing device, it disappeared from system but leds on my laptop shows that nvidia card is still powered on. This is because powering off gpu is done via ACPI calls and not via unbinding device. You can power it off via acpi_call with correct call for your platform. bbswitch can find a suitable call for your system automatically and perform it without acpi_call module. BTW bbswitch module is mantained despite main repo is very old, one year ago it was broken due kernel libraries changes and it was fixed in openSUSE. Edit: I read your #73 and I think you have trouble with acpi-call handling, I've found this and seems exactly your issue, resolved with acpi-handle-hack module (you have to edit code like in the comment with your settings and compile yourself). Module can be found in bbswitch repo

sndirsch commented 2 years ago

Thanks for your input @simopil !