openSUSE / SUSEPrime

Provide nvidia-prime like package for openSUSE
65 stars 11 forks source link

Power off support: cannot understand the instructions #87

Closed bfg01 closed 1 year ago

bfg01 commented 1 year ago

Hello. Did a fresh Leap 15.4 installation from scratch on a laptop with Optimus and GeForce 940M. Since NVIDIA website says they still support this chip with latest drivers (version > 515), I went for the G06 packages; and since it seemed to work ok without strange problems, I think it was the right one. Prime-select also looks working.

Now I wanted to try doing the "power off support" stuff, however I was not able to understand how to do it.

In the "Install bbswitch" section: "Blacklist bbswitch module", but suse-prime package seems to already do this by installing corresponding modprobe .conf files... Or does it mean just to double-check?

In the "Blacklist the NVIDIA modules so it can be loaded only when necessary" section: "Blacklist them in initrd"? Do I have to somehow modify the initrd with that "if" block? How can I do this?

In the "Install the systemd services to set correct card during boot" section: Exact same question as above

Could you help with this please? Thanks very much.

sndirsch commented 1 year ago

Yes, your GeForce 940m (unfortunately still Maxwell, not Turing yet) is still supported by latest NVIDIA drivers 515.x (G06). I think you refer to the github documentation (README.md). When you use the package suse-prime you don't need to install any additional files manually. It's mainly meant for users installing SUSEPrime from git. But if you want to use the "power off support" you need to install the 'bbswitch" package still. It's not longer being installed together with suse-prime by default due to issue #70. I think you're aware that with that you disable the nVidia GPU completley in "intel" mode with "bbswitch" installed, i.e. lose any hardware acceleration via the nVidia GPU. Maybe 'offload' mode is some compromise you may want to use instead, so make use of nVidia GPU only for specific applications (not the whole desktop) and that way save energy. Does this answer your questions?

bfg01 commented 1 year ago

Thanks very much sir. Yes, I was referring to the README.md

The reason I was confused was because I was following openSUSE's documentation: https://en.opensuse.org/SDB:NVIDIA_SUSE_Prime#Usage "Please note: if you have an Optimus laptop, there are additional steps required to make sure the nvidia card is powered off. A systemd service needs to be installed." So, on top of all of this, they seem to really suggest completely disabling the NVIDIA GPU...

You explanation did help a lot, but just to be sure, I think I'd reduce it to 3 final doubts: ---If using the package to install, do I no longer need to do anything else, but just install the bbswitch package if wanting "power-off support"? With this "prime-select intel" also powers off the NVIDIA GPU? ---If the bbswitch package gives bad issues on some laptops, is there a way to manually enabling/disabling the NVIDIA GPU? ---For using the "offload" mode, is it really just a matter of preceding the app name in the shell with "__NV...", as the NVIDIA docs say? This would feel slightly like Bumblebee usage...

Thanks very much for your attention.

sndirsch commented 1 year ago

The reason I was confused was because I was following openSUSE's documentation: https://en.opensuse.org/SDB:NVIDIA_SUSE_Prime#Usage "Please note: if you have an Optimus laptop, there are additional steps required to make sure the nvidia card is powered off. A systemd service needs to be installed." So, on top of all of this, they seem to really suggest completely disabling the NVIDIA GPU...

That's just for the intel mode, where you don't make use of NVIDIA GPU anyway. Usage of systemd service is integrated within suse-prime package.

--If using the package to install, do I no longer need to do anything else, but just install the bbswitch package if wanting "power-off support"? With this "prime-select intel" also powers off the NVIDIA GPU?

Yes, exactly.

---If the bbswitch package gives bad issues on some laptops, is there a way to manually enabling/disabling the NVIDIA GPU?

On some systems (depends on Hardware/BIOS) you can also disable NVIDIA GPU within your BIOS setup.

---For using the "offload" mode, is it really just a matter of preceding the app name in the shell with "__NV...", as the NVIDIA docs say? This would feel slightly like Bumblebee usage...

Yes, it's similar from the usage. Technically it's different. But I'm not a Bumblebee expert!

bfg01 commented 1 year ago

Thanks again sir. Then I think I followed correctly the instructions, but the "offload" mode is not working at all...

This is an old laptop Lenovo ThinkPad E570. I'm following this https://download.nvidia.com/XFree86/Linux-x86_64/435.21/README/primerenderoffload.html The integrated GPU must be modesetting; I have installed the packages xf86-video-{fbdev,vesa}, not the -intel one. Also boot parameters say nothing about "nomodesetting", so I think I should be ok here...

"...and NVIDIA GPU screens are enabled in /etc/X11/xorg.conf.d/nvidia.conf:..." I didn't have that file created at all after installing all the packages; I had to manually create it

"If GPU screen creation was successful, the log file /var/log/Xorg.0.log should contain lines with "NVIDIA(G0)", and querying the RandR providers with xrandr --listproviders should display a provider named "NVIDIA-G0" (for "NVIDIA GPU screen 0")." Still not successful:

[...]:~> grep NVIDIA /var/log/Xorg.0.log; echo $?
1
[...]:~> xrandr --listproviders
Providers: number : 2
Provider 0: id: 0x47; cap: 0xf (Source Output, Sink Output, Source Offload, Sink Offload); crtcs: 3; outputs: 5; associated providers: 0; name: modesetting
    output eDP-1
    output DP-1
    output HDMI-1
    output DP-2
    output HDMI-2
Provider 1: id: 0x1fb; cap: 0x0 (); crtcs: 0; outputs: 0; associated providers: 0; name: modesetting
[...]:~> cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-5.14.21-150400.24.21-default root=/dev/mapper/system-root splash=silent resume=/dev/system/swap preempt=full mitigations=auto quiet security=apparmor
[...]:~> glxinfo | grep "OpenGL renderer"
OpenGL renderer string: Mesa DRI Intel(R) HD Graphics 620 (KBL GT2)
[...]:~>

Oh, and there's a strange message at the very beginning of Leap's boot process: [...] x86/cpu: SGX disabled by BIOS Neither I found anything in BIOS setup related to Nvidia or GPU...

Am I missing something else? Thanks yet again.

sndirsch commented 1 year ago

Thanks again sir. Then I think I followed correctly the instructions, but the "offload" mode is not working at all...

This is an old laptop Lenovo ThinkPad E570. I'm following this https://download.nvidia.com/XFree86/Linux-x86_64/435.21/README/primerenderoffload.html The integrated GPU must be modesetting; I have installed the packages xf86-video-{fbdev,vesa}, not the -intel one. Also boot parameters say nothing about "nomodesetting", so I think I should be ok here...

Yes, you are.

"...and NVIDIA GPU screens are enabled in /etc/X11/xorg.conf.d/nvidia.conf:..." I didn't have that file created at all after installing all the packages; I had to manually create it

You don't need to create this file. Better remove it. The needed content is already in different .conf file snippet in the same directory.

"If GPU screen creation was successful, the log file /var/log/Xorg.0.log should contain lines with "NVIDIA(G0)", and querying the RandR providers with xrandr --listproviders should display a provider named "NVIDIA-G0" (for "NVIDIA GPU screen 0")." Still not successful:


[...]:~> grep NVIDIA /var/log/Xorg.0.log; echo $?
1

Could be that you're using a displaymanager (gdm/sddm), which writes the logfile to a different location.

[...]:~> xrandr --listproviders Providers: number : 2 Provider 0: id: 0x47; cap: 0xf (Source Output, Sink Output, Source Offload, Sink Offload); crtcs: 3; outputs: 5; associated providers: 0; name: modesetting output eDP-1 output DP-1 output HDMI-1 output DP-2 output HDMI-2 Provider 1: id: 0x1fb; cap: 0x0 (); crtcs: 0; outputs: 0; associated providers: 0; name: modesetting

Hmm. I think one of the provider should have the name 'nvidia'.

[...]:~> cat /proc/cmdline BOOT_IMAGE=/boot/vmlinuz-5.14.21-150400.24.21-default root=/dev/mapper/system-root splash=silent resume=/dev/system/swap preempt=full mitigations=auto quiet security=apparmor [...]:~> glxinfo | grep "OpenGL renderer" OpenGL renderer string: Mesa DRI Intel(R) HD Graphics 620 (KBL GT2)

Not sure if you have set the __NV* variables here.

__NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAME=nvidia /usr/bin/glxinfo -B ...



Oh, and there's a strange message at the very beginning of Leap's boot process: `[...] x86/cpu: SGX disabled by BIOS` Neither I found anything in BIOS setup related to Nvidia or GPU...

This sounds unrelated.

Maybe the "offload" mode is too complicated to be used. I suggest to work with the "nvidia" mode for now.

bfg01 commented 1 year ago

You don't need to create this file. Better remove it. The needed content is already in different .conf file snippet in the same directory.

Done

Could be that you're using a displaymanager (gdm/sddm), which writes the logfile to a different location.

I'm using KDE/Plasma

Not sure if you have set the __NV* variables here.

__NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAME=nvidia /usr/bin/glxinfo -B ...

[...]:~> __NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAME=nvidia glxinfo | grep "OpenGL renderer"; echo $?
OpenGL renderer string: NVIDIA GeForce 940MX/PCIe/SSE2
X Error of failed request:  GLXBadContextTag
Major opcode of failed request:  152 (GLX)
Minor opcode of failed request:  5 (X_GLXMakeCurrent)
Serial number of failed request:  54
Current serial number in output stream:  54
0
[...]:~>

Ouch... so it was actually working... Even glxgears do behave a bit differently when setting the __NV* variables. Though why that "X Error", which doesn't even exits with an error code? However xrandr still outputs the exact same with or without the vars... Oh, also I decided to not install the bbswitch package for now

sndirsch commented 1 year ago

"X Error" doesn't look good. I'm no longer sure about the xrandr output. Could be ok for "offload" mode. But better stay with "nvidia" mode.

bfg01 commented 1 year ago

Do you happen to have an idea regarding what could be wrong here? Why do you say "offload" mode is too complicated to be used? Or is it even just hardware incompatibility problem?

And yes, I just got the "X Error" with a standalone game with the vars set, and instead of running it just exited 1... Strange this didn't happen with glxgears I did a "test": switched to nvidia mode with sudo prime-select nvidia and logging out. With this I tried running glxgears and the game with the __NV* vars set. Result: immediate forced log out from X. Is this expected?

Oh, and I totally forgot: when using sudo prime-select intel|nvidia, why do changes apply only by logging out and back in? I mean, shutting down or rebooting without logging out/in does not apply the changes!

sndirsch commented 1 year ago

Do you happen to have an idea regarding what could be wrong here?

I have not the slightest idea what's the reason for the X error. :-(

Why do you say "offload" mode is too complicated to be used? Or is it even just hardware incompatibility problem?

It's hard to debug the issue without direct access to the system. And in most cases I can't do anything. In the end it's NVIDIA's proprietary driver ...

And yes, I just got the "X Error" with a standalone game with the vars set, and instead of running it just exited 1... Strange this didn't happen with glxgears

Hmm. I thought it did. According to your output you gave above.

I did a "test": switched to nvidia mode with sudo prime-select nvidia and logging out. With this I tried running glxgears and the game with the __NV* vars set. Result: immediate forced log out from X. Is this expected?

Haven't seen this yet, but haven't tried it yet either.

Oh, and I totally forgot: when using sudo prime-select intel|nvidia, why do changes apply only by logging out and back in? I mean, shutting down or rebooting without logging out/in does not apply the changes!

Things are complicated. Kernel modules need to be unloaded for switching from nvidia to intel mode, which you can't do as long as Xserver is running with nvidia driver (it needs these modules), so you have a process which is waiting until it gets a signal, this can be done, which is usually a logout from the Xsession. There is a systemd service involved, and so. Things are a bit fragile therefore. So please do what the script tells you. Thanks.

bfg01 commented 1 year ago

Sir, I think I finally sorted the issue with the __NV* vars and the errors.

Two things: --First a more updated version of the Nvidia docs: https://download.nvidia.com/XFree86/Linux-x86_64/515.76/README/primerenderoffload.html In which they now specify which minimal version of Xorg server is needed. Leap 15.4 only ships 1.20.3 by default, so only way to update is using the Xorg repo (which is actually "factory unstable?"): https://download.opensuse.org/repositories/X11:XOrg/openSUSE_Leap_15.4/

--Second, for this offload thing to actually work the prime-select mode must be set to "offload" -obviously-, and I had it set to "intel". It's certainly mentioned in the main README, but at least for me in a kind of "barely" way... Just as a small suggestion, perhaps you could put more emphasis in this fact...

With this stuff Nvidia offload finally worked.

Thanks again sir. With this perhaps you could consider the issue closed?

sndirsch commented 1 year ago

The assumption with xserver 1.20.7 is not correct. We added the patches, which are required, a long time ago (included since Leap 15.2)

Mon Sep 2 13:50:15 UTC 2019 - Stefan Dirsch sndirsch@suse.com

Indeed you need to run prime-select offload to activate offload mode. I assumed you were aware of this. I apologize.

sndirsch commented 1 year ago

Yes, I think we can close this now. Thanks for your feedback and your patience!