openSUSE / SUSEPrime

Provide nvidia-prime like package for openSUSE
64 stars 11 forks source link

High power consumption by default when integrated GPU selected #73

Closed bomiyr closed 2 years ago

bomiyr commented 2 years ago

Hi all,

Could you help me figure out if I'm doing something wrong or explain the behavior I have.

Well, I'll start with my laptop spec: Lenovo Legion 5, CPU AMD 4800H with AMD Renoir graphics + GeForce GTX 1660 Ti Mobile. OS openSUSE Tumbleweed (20211106 for now). In summer I installed Tumbleweed with suse-prime and was able to switch between amd and nvidia with "prime-select nvidia"/"prime-select unset". Also I copied prime-run script from arch and was able to use nvidia gpu offload. So my usecase is to use AMD graphics most of the time + gpu offload for games. Sometimes I switch to nvidia completely (not often, but I want to have such possibility).

But some time ago something changed. Gpu offload stopped working by default. The reason was in 09-nvidia-modprobe-bbswitch-G04.conf. After I commented the file content and launched mkinitrd, gpu offload started working. Also I noticed HDMI audio was not working, found this repo and was able to fix it.

As I understand, the point of 09-nvidia-modprobe-bbswitch-G04.conf and disabling HDMI audio in 90-nvidia-udev-pm-G05.rules is saving battery. It seems reasonable, if you don't load nvidia driver the GPU should not work and battery consumption will be lower. BUT not in my case!

By default, when files 09-nvidia-modprobe-bbswitch-G04.conf and 90-nvidia-udev-pm-G05.rules are untouched, the power consumption in idle is around 30W and laptop gets hotter on the side where nvidia gpu located. After I comment content for both 09-nvidia-modprobe-bbswitch-G04.conf and 90-nvidia-udev-pm-G05.rules, the power consumption in idle is around 13W without any hot zones.

I'm writing because I don't entirely understand what is going on here and want to improve OOB experience in OpenSUSE.

  1. Why power consumption is higher when nvidia driver is not working? maybe it should not be disabled at all? Right now default behavior is not good on my laptop (30W and hot corner, not able to use GPU offload). Btw, I found that it is possible to use "prime-select offload", but it is not working (could not load into GUI at all, if I remember right)
  2. Do I need to modify something to make it work even better? On Windows HW-monitor shows consumption around 7W, but I don't know if it's right value.

I want to help, so if you need more info or want me run some experiments on my laptop - it's not a problem.

sndirsch commented 2 years ago

Ok. I don't know prime-run from arch. I think for "offload" you can just run "prime-select offload". But you might be the first person who ever tried this, since support for amd has been added to suse-prime just recently. Also you need to set these environment variables mentioned on https://download.nvidia.com/XFree86/Linux-x86_64/435.21/README/primerenderoffload.html

09-nvidia-modprobe-bbswitch-G04.conf is only needed and used when using "amd" only mode you want to save power by using bbswitch kernel module to switch off nvidia GPU completely. I suggest to uninstall bbswitch package, when you don't want to use it.

Either use a) amd/bbswitch or b) offload mode.

Unfortunately we need to disable HMDI audio in order to enable the power save mode in "offload" mode. But you can change this manually in 90-nvidia-udev-pm-G05.rules https://github.com/openSUSE/SUSEPrime#hdmi-audio-support-does-not-work

Unfortunately I cannot test an AMD/NVIDIA setup, since I don't have access to such a laptop. Only Intel/NVIDIA.

sndirsch commented 2 years ago

Let's include @hatch01. He has implemented the AMD support to suse-prime. And @kevinsmia1939 . He's using it as well. Guys, maybe you can give a helping hand to @bomiyr . Would be great!

bomiyr commented 2 years ago
  1. prime-run is just handy way to launch gpu offload. it has following content

    #!/bin/bash
    __NV_PRIME_RENDER_OFFLOAD=1 __VK_LAYER_NV_optimus=NVIDIA_only __GLX_VENDOR_LIBRARY_NAME=nvidia "$@"

    So I can launch any app on nvidia gpu with just prime-run steam

  2. after doing prime-select offload my GUI (KDE) can't start at all, just white cursor on tty7.

  3. I also tried to run `prime-select get-current' (when it is in amd or unset state) and it says

    ...
    bbswitch not loaded. NVIDIA modules are NOT loaded
    if you want energy saving bbswitch should be loaded in intel mode

    Maybe I have some problem with bbswitch configuration?

  4. I saw information about HDMI-audio and was able to enable it. It is not a problem. The problem is why my laptop consumes more energy when it is disabled :-)

  5. As I said, I want to help and don't mind to be one of the first adopters of amd+nvidia hardware or offload mode. If you need some info or run tests on my machine feel free to ask. I have development experience (for Android), so maybe I could try to figure this out by myself, but right now I don't have any knowledge about linux and how things going here.

sndirsch commented 2 years ago
  1. Ok. So prime-run is just a wrapper for running applications with appropriate environment variables in "offload" (PRIME Render Offload) mode.

  2. I think we should try to figure out why "prime-select offload" doesn't work for you.

  3. You would need to install package "bbswitch" (zypper in bbswitch) for powersaving in "amd" mode. But with that mode you disable the nvidia GPU completely.

  4. NVIDIA power off support only works in "offload" mode, which up to now doesn't work for you. It neither works in "amd" nor "offload" mode.

Also In nvidia mode there is no power save mode.

bomiyr commented 2 years ago

bbswitch is installed. I understand that in different modes nvidia GPU can be disabled at all, or I will not have power saving. Right now prime-select does not work as intended.

  1. I want to know why it is not working
  2. And will I have better battery life if I get it work. I can live without offload or HDMI audio until it reduces power consumption
sndirsch commented 2 years ago

Ok. To save power the easiest would be to disable NVIDIA GPU in Firmware/BIOS. But I know sometimes this option is not available. Option two would be to use "amd" mode when having bbswitch installed. In that case nvidia kernel modules are unloaded and bbswitch kernel module loaded to disable NVIDIA GPU completeley. Option three is using "offload" mode where nvidia modules are loaded (NVIDIA's DynamicPowerManagement).

I suggest to begin with amd/bbswitch mode.

Please run (with bbswitch package installed)

prime-select amd prime-select log-view

and add the ouput here or attach the output as a file.

bomiyr commented 2 years ago

Some output is in russian, I hope it will not be a problem

home@DESKTOP-2PNTPB3:~> sudo zypper search -s bbswitch
[sudo] пароль для root:
Загрузка данных о репозиториях...
Чтение установленных пакетов...

S | Name                 | Type  | Version              | Arch        | Repository
--+----------------------+-------+----------------------+-------------+------------------------
i | bbswitch             | пакет | 0.8-11.42            | x86_64      | openSUSE-Tumbleweed-Oss
v | bbswitch             | пакет | 0.8-11.42            | i586        | openSUSE-Tumbleweed-Oss
i | bbswitch-kmp-default | пакет | 0.8_k5.14.14_1-11.38 | x86_64      | (Системные пакеты)
i | bbswitch-kmp-default | пакет | 0.8_k5.14.14_3-11.42 | x86_64      | openSUSE-Tumbleweed-Oss
v | bbswitch-kmp-default | пакет | 0.8_k5.14.14_3-11.42 | i586        | openSUSE-Tumbleweed-Oss
  | bbswitch-kmp-pae     | пакет | 0.8_k5.14.14_3-11.42 | i586        | openSUSE-Tumbleweed-Oss
home@DESKTOP-2PNTPB3:~> sudo prime-select amd
amd catched
Preparing first configuration
bbswitch not loaded. NVIDIA modules are NOT loaded
if you want energy saving bbswitch should be loaded in intel mode
Logout to switch graphics

And prime-select log-view output after relogin:

##SUSEPrime logfile##
[ 15:06:16 ] user_logout_waiter: started
[ 15:08:39 ] user_logout_waiter: X restart detected, preparing switch to amd
[ 15:08:41 ] NVIDIA card will be switched off, NVIDIA offloading will not be available
[ 15:08:41 ] trying switch OFF nvidia: bbswitch not loaded. NVIDIA modules are NOT loaded
if you want energy saving bbswitch should be loaded in intel mode
[ 15:08:41 ] Amd card correctly set
[ 15:08:41 ] HotSwitch: starting Display Manager
[ 15:08:41 ] HotSwitch: completed!
sndirsch commented 2 years ago

Thanks. Seems bbswitch kernel module could not be loaded. Please run

dmesg -c > /dev/null modprobe bbswitch dmesg

And add again the output of these commands.

bomiyr commented 2 years ago
home@DESKTOP-2PNTPB3:~> sudo dmesg -c > /dev/null
[sudo] пароль для root: 
home@DESKTOP-2PNTPB3:~> sudo modprobe bbswitch
modprobe: ERROR: could not insert 'bbswitch': No such device
home@DESKTOP-2PNTPB3:~> dmesg 
[  142.495714] bbswitch: loading out-of-tree module taints kernel.
[  142.496725] bbswitch: version 0.8
[  142.496745] bbswitch: Found discrete VGA device 0000:01:00.0: \_SB_.PCI0.GPP0.PEGP
[  142.496750] bbswitch: Found discrete VGA device 0000:05:00.0: \_SB_.PCI0.GP17.VGA_
[  142.496777] bbswitch: failed to evaluate \_SB_.PCI0.GP17.VGA_._DSM {0xF8,0xD8,0x86,0xA4,0xDA,0x0B,0x1B,0x47,0xA7,0x2B,0x60,0x42,0xA6,0xB5,0xBE,0xE0} 0x100 0x0 {0x00,0x00,0x00,0x00}: AE_NOT_FOUND
[  142.496784] bbswitch: failed to evaluate \_SB_.PCI0.GP17.VGA_._DSM {0xA0,0xA0,0x95,0x9D,0x60,0x00,0x48,0x4D,0xB3,0x4D,0x7E,0x5F,0xEA,0x12,0x9F,0xD4} 0x102 0x0 {0x00,0x00,0x00,0x00}: AE_NOT_FOUND
[  142.496786] bbswitch: No suitable _DSM call found.
sndirsch commented 2 years ago

Thanks, so unfortunately bbswitch module doesn't work on your system. :-( suse-prime is just using it. Don't ask me for details here. Remains only "offload" mode. Here run

prime-select offload pirme-select log-view

and add again the ouptut of these commands.

bomiyr commented 2 years ago
home@DESKTOP-2PNTPB3:~> sudo prime-select offload
[sudo] пароль для root:
offload catched
bbswitch not loaded. NVIDIA modules are NOT loaded
if you want energy saving bbswitch should be loaded in intel mode
Logout to switch graphics

As I said before, in this mode I don't have GUI, just white cursor on tty7. Here is pirme-select log-view

[ 20:18:53 ] user_logout_waiter: started
[ 20:19:35 ] user_logout_waiter: X restart detected, preparing switch to offload
[ 20:19:37 ] Using default intel modesetting driver for offloading.
[ 20:19:39 ] Adding support for NVIDIA Prime Render Offload
[ 20:19:39 ] Intel card correctly set
[ 20:19:39 ] HotSwitch: starting Display Manager
[ 20:19:40 ] HotSwitch: completed!
bomiyr commented 2 years ago

Looking at the log it seems I found the reason why offloading is not working.

function offload_pref_check {
    #checks if there's a preference for nvidia-offloading
    if ! [ -f /etc/prime/offload_type ]; then
        echo "intel" > /etc/prime/offload_type
        logging "Using default intel modesetting driver for offloading."
    fi
}

I tried to change echo "intel" to echo "amd" and it is working now. So the question is how to know whether we need to write intel or amd here? If you help me with this I can create pull request.

sndirsch commented 2 years ago

Excellent catch. I'm attaching a patch and the hopefully fixed script.

sndirsch commented 2 years ago

patch.txt prime-select.sh.txt

sndirsch commented 2 years ago

Can you give the fix a try? I needed to rename the script, since github only accepts text mode files with .txt extension.

bomiyr commented 2 years ago

well... Without applying your patch I have different problem now and I don't think it is me messed up something... Right now prime-select nvidia does not work (Black screen on tty7). I reinstalled suse-prime and nvidia drivers from repos, but still not working. Probably it is after I made dist upgrade this morning or something...

sndirsch commented 2 years ago

Oh. I think I know what happened. The latest Xserver broke suse-prime's "nvidia" mode, but the offload mode should work (now with our fix).

https://bugzilla.opensuse.org/show_bug.cgi?id=1192751

bomiyr commented 2 years ago

I can confirm that offload works with your attached script.

#SUSEPrime logfile##
[ 22:51:17 ] user_logout_waiter: started
[ 22:51:33 ] user_logout_waiter: X restart detected, preparing switch to offload
[ 22:51:34 ] Using default amd modesetting driver for offloading.
[ 22:51:35 ] Adding support for NVIDIA Prime Render Offload
[ 22:51:35 ] Amd card correctly set
[ 22:51:35 ] HotSwitch: starting Display Manager
[ 22:51:36 ] HotSwitch: completed!
home@DESKTOP-2PNTPB3:~> glxinfo | grep OpenGL\ renderer
OpenGL renderer string: AMD RENOIR (DRM 3.42.0, 5.14.14-3-default, LLVM 13.0.0)
home@DESKTOP-2PNTPB3:~> prime-run glxinfo | grep OpenGL\ renderer
OpenGL renderer string: NVIDIA GeForce GTX 1660 Ti/PCIe/SSE2
bomiyr commented 2 years ago

Thank you for the help. Now I will try to understand what is wrong with bbswitch

sndirsch commented 2 years ago

Hooray. I'll fix it in git and make a new release.

sndirsch commented 2 years ago

Thank you for the help. Now I will try to understand what is wrong with bbswitch

You can report this issue in SUSE's bugzilla on https://bugzilla.opensuse.org/

sndirsch commented 2 years ago

I just made a new release 0.8.5 with the fix applied. I also submitted a new package with this version for Tumbleweed. Let's close this as fixed. Thanks for the report and your help fixing this issue! It was a very productive conversation! :-)

bomiyr commented 2 years ago

The issue with prime-select is fixed, but the issue with bbswitch is not. I just leave this comment here, maybe it will help someone with similar hardware not to waste time. I went to bbswitch repo, (btw, it looks almost dead), found similar issue and comment with the fix. I downloaded the code, applied the fix and compiled. After that I was able to load my version of bbswitch. And still I was not able to make it work, even more - it has really scary behavior. Right after I disable nvidia GPU all fans started spinning at 100%, and then after several dozens of seconds laptop just powered off. Maybe it has some hardware protection or something similar that treat disabled GPU as something wrong, I don't know. At this point I will end my investigation. At least prime-select offload will work properly :smile: EDITED: Actually I found out how to disable Nvidia GPU. It is possible with following command, but you need to change device address echo 1 | sudo tee /sys/bus/pci/devices/0000\:01\:00.0/remove To get the correct addres execute sudo lspci | grep VGA. Output should be like this:

01:00.0 VGA compatible controller: NVIDIA Corporation TU116M [GeForce GTX 1660 Ti Mobile] (rev a1)
05:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Renoir (rev c6)

Now in offload mode I have power consumption ~11,5-13W, and in amd mode with disabled Nvidia GPU and ~8,5-9W

sndirsch commented 2 years ago

Thanks a lot for the information. The patch makes sense to me. Apparently without the patch the kernel module tries to initialize the AMD GPU instead of the NVIDIA GPU. ;-) Not sure why the behaviour then with the patch is that bad. :-(

I hope that power consumption now works ok for you with "offload" mode.

You may want to experiment with NVreg_DynamicPowerManagement driver option in /usr/lib/modprobe.d/09-nvidia-modprobe-pm-G05.conf. I needed to set the default to 0x01 due to issue #52.