openSUSE / SUSEPrime

Provide nvidia-prime like package for openSUSE
64 stars 11 forks source link

NVIDIA card is powered ON when AC adapter is unplugged #75

Closed ryan-ronnander closed 2 years ago

ryan-ronnander commented 2 years ago

Greetings,

I've recently configured a new Dell XPS 15 with SUSEPrime on a relatively fresh Tumbleweed installation. As this is a primarily a work laptop, I decided to just leave the NVIDIA GPU powered off and followed the NVIDIA power off support since 390.xxx driver (G04/G05 driver packages) steps. After adding the nvidia.prime=intel everything appeared to be functioning great. The laptop was pulling close to ~10W at the wall now compared to ~20W, perfect for battery life savings.

Then I noticed a few reboots later that the NVIDIA GPU was powered on. The output of cat /proc/acpi/bbswitch reports ON and there is no way to turn off the GPU. I had made no changes and there were no kernel or driver updates so I did some digging.

I finally figured out that when the laptop is running on battery the NVIDIA GPU is forced on somehow. It's forced on at boot after the suse-prime service explicitly disables the GPU. It's also forced on while the system is in use if the power adapter is unplugged.

Fresh boot on battery (trying switch OFF nvidia: [bbswitch] NVIDIA card is OFF):

Jan 26 13:53:36 kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-5.16.1-1-default root=/dev/mapper/system-root splash=silent resume=/dev/system/swap quiet security=apparmor mitigations=auto nvidia.prime=intel
Jan 26 13:53:36 kernel: Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.16.1-1-default root=/dev/mapper/system-root splash=silent resume=/dev/system/swap quiet security=apparmor mitigations=auto nvidia.prime=intel
Jan 26 13:53:36 systemd-modules-load[283]: Failed to find module 'bbswitch'
Jan 26 13:53:36 dracut-cmdline[309]: Using kernel command line parameters:  rd.driver.pre=btrfs rd.luks.uuid=luks-067749d5-86a2-4d2d-8224-e06413455b0d rd.lvm.lv=system/root   rd.lvm.lv=system/swap root=/dev/mapper/system-root rootfstype=btrfs rootflags=rw,relatime,ssd,space_cache=v2,subvolid=265,subvol=/@/.snapshots/1/snapshot,subvol=@/.snapshots/1/snapshot   BOOT_IMAGE=/boot/vmlinuz-5.16.1-1-default root=/dev/mapper/system-root splash=silent resume=/dev/system/swap quiet security=apparmor mitigations=auto nvidia.prime=intel
Jan 26 13:53:37 kernel: ACPI: video: Video Device [PEGP] (multi-head: no  rom: yes  post: no)
Jan 26 13:53:43 systemd-modules-load[885]: Module 'bbswitch' is deny-listed
Jan 26 13:53:43 kernel: audit: type=1400 audit(1643234023.219:6): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=924 comm="apparmor_parser"
Jan 26 13:53:43 kernel: audit: type=1400 audit(1643234023.219:7): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" pid=924 comm="apparmor_parser"
Jan 26 13:53:43 tlp[1214]: Error: tlp.service is not enabled, power saving will not apply on boot.
Jan 26 13:53:43 tlp[1214]: >>> Invoke 'systemctl enable tlp.service' to correct this!
Jan 26 13:53:43 tlp[1214]: Warning: systemd-rfkill.service is not masked, radio device switching may not work as configured.
Jan 26 13:53:43 tlp[1214]: >>> Invoke 'systemctl mask systemd-rfkill.service' to correct this.
Jan 26 13:53:43 tlp[1214]: Warning: systemd-rfkill.socket is not masked, radio device switching may not work as configured.
Jan 26 13:53:43 tlp[1214]: >>> Invoke 'systemctl mask systemd-rfkill.socket' to correct this.
Jan 26 13:53:44 systemd[1]: Starting SUSEPrime systemd service...
Jan 26 13:53:44 suse-prime[1525]: Boot: nvidia.prime=intel kernel parameter detected!
Jan 26 13:53:44 suse-prime[1534]: Boot: setting-up intel card
Jan 26 13:53:44 prime-select[1567]: libkmod: kmod_module_get_holders: could not open '/sys/module/pcc_cpufreq/holders': No such file or directory
Jan 26 13:53:44 kernel: bbswitch: loading out-of-tree module taints kernel.
Jan 26 13:53:44 kernel: bbswitch: version 0.8
Jan 26 13:53:44 kernel: bbswitch: Found integrated VGA device 0000:00:02.0: \_SB_.PC00.GFX0
Jan 26 13:53:44 kernel: bbswitch: Found discrete VGA device 0000:01:00.0: \_SB_.PC00.PEG1.PEGP
Jan 26 13:53:44 kernel: ACPI Warning: \_SB.PC00.PEG1.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20210930/nsarguments-61)
Jan 26 13:53:44 kernel: bbswitch: detected an Optimus _DSM function
Jan 26 13:53:44 kernel: bbswitch: disabling discrete graphics
Jan 26 13:53:44 kernel: bbswitch: Succesfully loaded. Discrete card 0000:01:00.0 is off
Jan 26 13:53:44 suse-prime[1704]: NVIDIA card will be switched off, NVIDIA offloading will not be available
Jan 26 13:53:44 suse-prime[1719]: trying switch OFF nvidia: [bbswitch] NVIDIA card is OFF
Jan 26 13:53:44 suse-prime[1737]: Intel card correctly set
Jan 26 13:53:48 suse-prime[1986]: updated /home/rronnander/.config/kdeglobals
Jan 26 13:53:48 suse-prime[1991]: HotSwitch: completed!
Jan 26 13:53:48 systemd[1]: prime-select.service: Deactivated successfully.
Jan 26 13:53:48 systemd[1]: Finished SUSEPrime systemd service.
Jan 26 13:53:54 dbus-daemon[2179]: [session uid=1000 pid=2179] Activating via systemd: service name='org.gtk.vfs.Daemon' unit='gvfs-daemon.service' requested by ':1.2' (uid=1000 pid=2191 comm="nvidia-settings --load-config-only ")
Jan 26 13:53:54 dbus-daemon[2179]: [session uid=1000 pid=2179] Activating via systemd: service name='org.a11y.Bus' unit='at-spi-dbus-bus.service' requested by ':1.6' (uid=1000 pid=2191 comm="nvidia-settings --load-config-only ")

Fresh boot on AC power (trying switch OFF nvidia: [bbswitch] NVIDIA card is OFF):

Jan 26 13:49:14 kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-5.16.1-1-default root=/dev/mapper/system-root splash=silent resume=/dev/system/swap quiet security=apparmor mitigations=auto nvidia.prime=intel
Jan 26 13:49:14 kernel: Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.16.1-1-default root=/dev/mapper/system-root splash=silent resume=/dev/system/swap quiet security=apparmor mitigations=auto nvidia.prime=intel
Jan 26 13:49:14 systemd-modules-load[284]: Failed to find module 'bbswitch'
Jan 26 13:49:14 dracut-cmdline[309]: Using kernel command line parameters:  rd.driver.pre=btrfs rd.luks.uuid=luks-067749d5-86a2-4d2d-8224-e06413455b0d rd.lvm.lv=system/root   rd.lvm.lv=system/swap root=/dev/mapper/system-root rootfstype=btrfs rootflags=rw,relatime,ssd,space_cache=v2,subvolid=265,subvol=/@/.snapshots/1/snapshot,subvol=@/.snapshots/1/snapshot   BOOT_IMAGE=/boot/vmlinuz-5.16.1-1-default root=/dev/mapper/system-root splash=silent resume=/dev/system/swap quiet security=apparmor mitigations=auto nvidia.prime=intel
Jan 26 13:49:16 kernel: ACPI: video: Video Device [PEGP] (multi-head: no  rom: yes  post: no)
Jan 26 13:49:21 systemd-modules-load[882]: Module 'bbswitch' is deny-listed
Jan 26 13:49:21 kernel: audit: type=1400 audit(1643233761.483:6): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=921 comm="apparmor_parser"
Jan 26 13:49:21 kernel: audit: type=1400 audit(1643233761.483:7): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" pid=921 comm="apparmor_parser"
Jan 26 13:49:22 tlp[1214]: Error: tlp.service is not enabled, power saving will not apply on boot.
Jan 26 13:49:22 tlp[1214]: >>> Invoke 'systemctl enable tlp.service' to correct this!
Jan 26 13:49:22 tlp[1214]: Warning: systemd-rfkill.service is not masked, radio device switching may not work as configured.
Jan 26 13:49:22 tlp[1214]: >>> Invoke 'systemctl mask systemd-rfkill.service' to correct this.
Jan 26 13:49:22 tlp[1214]: Warning: systemd-rfkill.socket is not masked, radio device switching may not work as configured.
Jan 26 13:49:22 tlp[1214]: >>> Invoke 'systemctl mask systemd-rfkill.socket' to correct this.
Jan 26 13:49:22 systemd[1]: Starting SUSEPrime systemd service...
Jan 26 13:49:22 suse-prime[1573]: Boot: nvidia.prime=intel kernel parameter detected!
Jan 26 13:49:22 suse-prime[1577]: Boot: setting-up intel card
Jan 26 13:49:25 kernel: bbswitch: loading out-of-tree module taints kernel.
Jan 26 13:49:25 kernel: bbswitch: version 0.8
Jan 26 13:49:25 kernel: bbswitch: Found integrated VGA device 0000:00:02.0: \_SB_.PC00.GFX0
Jan 26 13:49:25 kernel: bbswitch: Found discrete VGA device 0000:01:00.0: \_SB_.PC00.PEG1.PEGP
Jan 26 13:49:25 kernel: ACPI Warning: \_SB.PC00.PEG1.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20210930/nsarguments-61)
Jan 26 13:49:25 kernel: bbswitch: detected an Optimus _DSM function
Jan 26 13:49:25 kernel: bbswitch: disabling discrete graphics
Jan 26 13:49:25 kernel: bbswitch: Succesfully loaded. Discrete card 0000:01:00.0 is off
Jan 26 13:49:25 suse-prime[1893]: NVIDIA card will be switched off, NVIDIA offloading will not be available
Jan 26 13:49:25 suse-prime[1904]: trying switch OFF nvidia: [bbswitch] NVIDIA card is OFF
Jan 26 13:49:25 suse-prime[1911]: Intel card correctly set
Jan 26 13:49:27 suse-prime[1990]: updated /home/rronnander/.config/kdeglobals
Jan 26 13:49:27 suse-prime[1993]: HotSwitch: completed!
Jan 26 13:49:27 systemd[1]: prime-select.service: Deactivated successfully.
Jan 26 13:49:27 systemd[1]: Finished SUSEPrime systemd service.
Jan 26 13:49:29 kernel: nvidia: module license 'NVIDIA' taints kernel.
Jan 26 13:49:29 kernel: nvidia: module verification failed: signature and/or required key missing - tainting kernel
Jan 26 13:49:29 kernel: nvidia: unknown parameter 'prime' ignored
Jan 26 13:49:29 kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 234
Jan 26 13:49:29 kernel: nvidia 0000:01:00.0: enabling device (0006 -> 0007)
Jan 26 13:49:29 kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel Module  510.39.01  Fri Dec 31 11:03:22 UTC 2021
Jan 26 13:49:29 kernel: nvidia_uvm: module uses symbols from proprietary module nvidia, inheriting taint.
Jan 26 13:49:29 kernel: nvidia-uvm: Loaded the UVM driver, major device number 510.
Jan 26 13:49:29 kernel: nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  510.39.01  Fri Dec 31 10:52:52 UTC 2021
Jan 26 13:49:29 kernel: [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
Jan 26 13:49:31 kernel: [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 1
Jan 26 13:50:02 dbus-daemon[2254]: [session uid=1000 pid=2254] Activating via systemd: service name='org.gtk.vfs.Daemon' unit='gvfs-daemon.service' requested by ':1.2' (uid=1000 pid=2266 comm="nvidia-settings --load-config-only ")
Jan 26 13:50:02 dbus-daemon[2254]: [session uid=1000 pid=2254] Activating via systemd: service name='org.a11y.Bus' unit='at-spi-dbus-bus.service' requested by ':1.6' (uid=1000 pid=2266 comm="nvidia-settings --load-config-only ")

NVIDIA packages:

S  | Name                      | Type    | Version                 | Arch   | Repository
---+---------------------------+---------+-------------------------+--------+------------------------
i  | kernel-firmware-nvidia    | package | 20220119-1.1            | noarch | openSUSE-Tumbleweed-Oss
i+ | libnvidia-egl-wayland1    | package | 1.1.9-1.1               | x86_64 | openSUSE-Tumbleweed-Oss
i+ | nvidia-computeG06         | package | 510.39.01-5.1           | x86_64 | NVIDIA
i  | nvidia-gfxG06-kmp-default | package | 510.39.01_k5.16.0_1-5.2 | x86_64 | NVIDIA
i+ | nvidia-glG06              | package | 510.39.01-5.1           | x86_64 | NVIDIA
i+ | x11-video-nvidiaG06       | package | 510.39.01-5.1           | x86_64 | NVIDIA

I will make some time to try downgrading from the 510 series, but it feels like something else with this laptop is force loading the nvidia kernel module when the ac adapter is unplugged.

sndirsch commented 2 years ago

I haven't heard about such an issue yet.

ryan-ronnander commented 2 years ago

I now have the 470 drivers installed, although the behavior seems similar between 470 and 510. This time, the NVIDIA modules never load at boot time (which is mostly great), but unplugging the AC adapter still causes issues. Long reply ahead.

All testing in this reply (with 470 series) was also performed with NVreg_DynamicPowerManagement=0x02.

Relevant installed packages:

$ sudo zypper --no-refresh search -s --installed-only nvidia bbswitch suse-prime

S  | Name                      | Type    | Version               | Arch   | Repository
---+---------------------------+---------+-----------------------+--------+------------------------
i  | bbswitch-kmp-default      | package | 0.8_k5.16.0_1-11.51   | x86_64 | (System Packages)
i  | bbswitch-kmp-default      | package | 0.8_k5.16.1_1-11.52   | x86_64 | openSUSE-Tumbleweed-Oss
i  | kernel-firmware-nvidia    | package | 20220119-1.1          | noarch | openSUSE-Tumbleweed-Oss
i+ | libnvidia-egl-wayland1    | package | 1.1.9-1.1             | x86_64 | openSUSE-Tumbleweed-Oss
i  | nvidia-computeG05         | package | 470.94-48.1           | x86_64 | NVIDIA
i  | nvidia-gfxG05-kmp-default | package | 470.94_k5.16.0_1-48.2 | x86_64 | NVIDIA
i  | nvidia-glG05              | package | 470.94-48.1           | x86_64 | NVIDIA
i  | plasma5-applet-suse-prime | package | 1.1-3.1               | noarch | openSUSE-Tumbleweed-Oss
i  | suse-prime                | package | 0.8.5-1.2             | noarch | openSUSE-Tumbleweed-Oss
i+ | x11-video-nvidiaG05       | package | 470.94-48.1           | x86_64 | NVIDIA

After testing out offload mode, I switched back to intel to test if the GPU has powered down.

System log looks good:

Jan 26 18:41:36 suse-prime[3337]: user_logout_waiter: started
Jan 26 18:41:49 suse-prime[3648]: user_logout_waiter: X restart detected, preparing switch to intel
Jan 26 18:41:51 kernel: [drm] [nvidia-drm] [GPU ID 0x00000100] Unloading driver
Jan 26 18:41:51 kernel: nvidia-modeset: Unloading
Jan 26 18:41:51 kernel: nvidia-uvm: Unloaded the UVM driver.
Jan 26 18:41:51 kernel: nvidia-nvlink: Unregistered the Nvlink Core, major device number 234
Jan 26 18:41:51 kernel: bbswitch: version 0.8
Jan 26 18:41:51 kernel: bbswitch: Found integrated VGA device 0000:00:02.0: \_SB_.PC00.GFX0
Jan 26 18:41:51 kernel: bbswitch: Found discrete VGA device 0000:01:00.0: \_SB_.PC00.PEG1.PEGP
Jan 26 18:41:51 kernel: bbswitch: detected an Optimus _DSM function
Jan 26 18:41:51 kernel: bbswitch: disabling discrete graphics
Jan 26 18:41:51 kernel: bbswitch: Succesfully loaded. Discrete card 0000:01:00.0 is off
Jan 26 18:41:51 suse-prime[3697]: NVIDIA card will be switched off, NVIDIA offloading will not be available
Jan 26 18:41:51 suse-prime[3708]: trying switch OFF nvidia: [bbswitch] NVIDIA card is OFF
Jan 26 18:41:51 suse-prime[3715]: Intel card correctly set
Jan 26 18:41:51 suse-prime[3718]: HotSwitch: starting Display Manager
Jan 26 18:41:52 suse-prime[3770]: HotSwitch: completed!

The prime-select output looks good:

$ sudo prime-select get-current
Driver configured: intel
[bbswitch] NVIDIA card is OFF

D3cold reported as the power_state (this is good):

$ cat /sys/bus/pci/devices/0000\:01\:00.0/power_state 
D3cold

At the wall the system is measuring ~10-12 Watts of power draw, indicicative of the NVIDIA GPU actually being power off.

Testing battery only operation (AC adapter unplugged after initial boot)

The prime-select output now reports the card is ON:

$ sudo prime-select get-current
Driver configured: intel
[bbswitch] NVIDIA card is ON

However, the reported power_state is still marked as D3cold:

$ cat /sys/bus/pci/devices/0000\:01\:00.0/power_state 
D3cold

The system log isn't very useful, just complains about the tlp sevrice not running:

Jan 26 18:51:42 tlp[5432]: Error: tlp.service is not enabled, power saving will not apply on boot.
Jan 26 18:51:42 tlp[5432]: >>> Invoke 'systemctl enable tlp.service' to correct this!
Jan 26 18:51:42 tlp[5437]: Error: tlp.service is not enabled, power saving will not apply on boot.
Jan 26 18:51:42 tlp[5437]: >>> Invoke 'systemctl enable tlp.service' to correct this!
Jan 26 18:51:42 tlp[5432]: Warning: systemd-rfkill.service is not masked, radio device switching may not work as configured.
Jan 26 18:51:42 tlp[5432]: >>> Invoke 'systemctl mask systemd-rfkill.service' to correct this.
Jan 26 18:51:42 tlp[5432]: Warning: systemd-rfkill.socket is not masked, radio device switching may not work as configured.
Jan 26 18:51:42 tlp[5432]: >>> Invoke 'systemctl mask systemd-rfkill.socket' to correct this.
Jan 26 18:51:42 tlp[5437]: Warning: systemd-rfkill.service is not masked, radio device switching may not work as configured.
Jan 26 18:51:42 tlp[5437]: >>> Invoke 'systemctl mask systemd-rfkill.service' to correct this.
Jan 26 18:51:42 tlp[5437]: Warning: systemd-rfkill.socket is not masked, radio device switching may not work as configured.
Jan 26 18:51:42 tlp[5437]: >>> Invoke 'systemctl mask systemd-rfkill.socket' to correct this.

Re-plugging in the AC adapter

After plugging in the AC adapter again (AC power -> battery -> AC power), prime-select still reports the card is ON:

$ sudo prime-select get-current
Driver configured: intel
[bbswitch] NVIDIA card is ON

This time the reported power_state is D0:

$ cat /sys/bus/pci/devices/0000\:01\:00.0/power_state 
D0

After giving the battery a few minutes to fully charge, the system is drawing ~30 Watts at idle from the wall, indicative the NVIDIA GPU is running.

I cannot simply set it to intel again:

$ sudo prime-select intel
intel catched
intel driver already in use!

As a workaround to reset "back" to intel with the NVIDIA GPU powerd off, I attempted to set prime-select to offload, logged out and back to discover the card has dropped off the bus:

Jan 26 19:05:52 suse-prime[8214]: Loading nvidia_modules
Jan 26 19:05:53 kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 234
Jan 26 19:05:53 kernel: NVRM: The NVIDIA GPU 0000:01:00.0
                        NVRM: (PCI ID: 10de:25a0) installed in this system has
                        NVRM: fallen off the bus and is not responding to commands.
Jan 26 19:05:53 kernel: nvidia: probe of 0000:01:00.0 failed with error -1
Jan 26 19:05:53 kernel: NVRM: The NVIDIA probe routine failed for 1 device(s).
Jan 26 19:05:53 kernel: NVRM: None of the NVIDIA devices were initialized.
Jan 26 19:05:53 kernel: nvidia-nvlink: Unregistered the Nvlink Core, major device number 234
Jan 26 19:05:53 kernel: nvidia_modeset: Unknown symbol nvidia_register_module (err -2)
Jan 26 19:05:53 kernel: nvidia_modeset: Unknown symbol nvidia_get_rm_ops (err -2)
Jan 26 19:05:53 kernel: nvidia_modeset: Unknown symbol nvidia_unregister_module (err -2)
Jan 26 19:05:53 kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 234
Jan 26 19:05:53 kernel: NVRM: The NVIDIA GPU 0000:01:00.0
                        NVRM: (PCI ID: 10de:25a0) installed in this system has
                        NVRM: fallen off the bus and is not responding to commands.
Jan 26 19:05:53 kernel: nvidia: probe of 0000:01:00.0 failed with error -1
Jan 26 19:05:53 kernel: NVRM: The NVIDIA probe routine failed for 1 device(s).
Jan 26 19:05:53 kernel: NVRM: None of the NVIDIA devices were initialized.
Jan 26 19:05:53 kernel: nvidia-nvlink: Unregistered the Nvlink Core, major device number 234
Jan 26 19:05:53 suse-prime[8230]: PCI BusID of NVIDIA card could not be detected!
Jan 26 19:05:53 suse-prime[8233]: NVIDIA Prime Render Offload not supported!
Jan 26 19:05:53 suse-prime[8240]: Intel card correctly set
Jan 26 19:05:53 suse-prime[8243]: HotSwitch: starting Display Manager
Jan 26 19:05:53 suse-prime[8296]: HotSwitch: completed!

At this point I set prime-select to "back" to intel, rebooted (on AC power still), and everything looks good at boot:

$ sudo journalctl --no-hostname -b -0 | grep -i -E 'nvidia|bbswitch|PEGP|prime|tlp|NVRM' 
Jan 26 19:12:05 kernel: ACPI: video: Video Device [PEGP] (multi-head: no  rom: yes  post: no)
Jan 26 19:12:11 kernel: audit: type=1400 audit(1643253131.199:5): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=928 comm="apparmor_parser"
Jan 26 19:12:11 kernel: audit: type=1400 audit(1643253131.199:6): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" pid=928 comm="apparmor_parser"
Jan 26 19:12:11 tlp[1224]: Error: tlp.service is not enabled, power saving will not apply on boot.
Jan 26 19:12:11 tlp[1224]: >>> Invoke 'systemctl enable tlp.service' to correct this!
Jan 26 19:12:11 tlp[1224]: Warning: systemd-rfkill.service is not masked, radio device switching may not work as configured.
Jan 26 19:12:11 tlp[1224]: >>> Invoke 'systemctl mask systemd-rfkill.service' to correct this.
Jan 26 19:12:11 tlp[1224]: Warning: systemd-rfkill.socket is not masked, radio device switching may not work as configured.
Jan 26 19:12:11 tlp[1224]: >>> Invoke 'systemctl mask systemd-rfkill.socket' to correct this.
Jan 26 19:12:12 systemd[1]: Starting SUSEPrime systemd service...
Jan 26 19:12:12 suse-prime[1543]: Boot: setting-up intel card
Jan 26 19:12:12 kernel: bbswitch: loading out-of-tree module taints kernel.
Jan 26 19:12:12 kernel: bbswitch: version 0.8
Jan 26 19:12:12 kernel: bbswitch: Found integrated VGA device 0000:00:02.0: \_SB_.PC00.GFX0
Jan 26 19:12:12 kernel: bbswitch: Found discrete VGA device 0000:01:00.0: \_SB_.PC00.PEG1.PEGP
Jan 26 19:12:12 kernel: ACPI Warning: \_SB.PC00.PEG1.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20210930/nsarguments-61)
Jan 26 19:12:12 kernel: bbswitch: detected an Optimus _DSM function
Jan 26 19:12:12 kernel: bbswitch: disabling discrete graphics
Jan 26 19:12:12 kernel: bbswitch: Succesfully loaded. Discrete card 0000:01:00.0 is off
Jan 26 19:12:12 suse-prime[1638]: NVIDIA card will be switched off, NVIDIA offloading will not be available
Jan 26 19:12:12 suse-prime[1658]: trying switch OFF nvidia: [bbswitch] NVIDIA card is OFF
Jan 26 19:12:12 suse-prime[1668]: Intel card correctly set
Jan 26 19:12:16 suse-prime[1977]: updated /home/rronnander/.config/kdeglobals
Jan 26 19:12:16 suse-prime[1980]: HotSwitch: completed!
Jan 26 19:12:16 systemd[1]: prime-select.service: Deactivated successfully.
Jan 26 19:12:16 systemd[1]: Finished SUSEPrime systemd service.

System loaded as expected, pulling ~10 Watts at idle, no nvidia modules loaded.

Starting the system on battery power from poweroff event

Establishing that everything loads as expected with the AC power plugged in, I decided to test starting the system on battery power.

Starting the system from battery shows ON and D0 power state (not good):

$ sudo prime-select get-current
Driver configured: intel
[bbswitch] NVIDIA card is ON

$ cat /sys/bus/pci/devices/0000\:01\:00.0/power_state 
D0

Trying to get the card powered back down after booting on battery power puts the card in a D3hot state:

$ sudo tee /proc/acpi/bbswitch <<<OFF
OFF

$ sudo prime-select get-current
Driver configured: intel
[bbswitch] NVIDIA card is ON

$ cat /sys/bus/pci/devices/0000\:01\:00.0/power_state 
D3hot

Reloading the bbswitch module manages to put the card back to a D0 state:

$ sudo rmmod bbswitch && sudo modprobe bbswitch

$ cat /sys/bus/pci/devices/0000\:01\:00.0/power_state 
D0

Once the card is in D0 I can re-issue an OFF command to bbswitch and get the NVIDIA GPU to power off on battery power:

$sudo tee /proc/acpi/bbswitch <<<OFF
OFF

$ cat /sys/bus/pci/devices/0000\:01\:00.0/power_state 
D3cold

$ sudo prime-select get-current
Driver configured: intel
[bbswitch] NVIDIA card is OFF

So, some success I suppose as I can use this laptop on battery power withe NVIDIA GPU disabled, even if the initial boot up is on battery power.

Plugging in the AC adapter from battery powered state (where NVIDIA GPU was powered down in battery state)

Moving into this state the system keeps the NVIDIA GPU powered down and D3cold is reported (as expected):

$ sudo prime-select get-current

Driver configured: intel
[bbswitch] NVIDIA card is OFF

$ cat /sys/bus/pci/devices/0000\:01\:00.0/power_state 
D3cold
sndirsch commented 2 years ago

No idea, maybe bbswitch just doesn't work on your system reliably. That would be comparable to issue#74 I suggest to get rid of bbswitch packages and switch to "offload" mode.

ryan-ronnander commented 2 years ago

I'll take a look at issue #74

Testing offload on AC power (fresh boot)

With offload selected and bbswitch uninstalled, the NVIDIA drivers initially set the power_state to D0.

$ cat /sys/bus/pci/devices/0000\:01\:00.0/power_state 
D0

The D0 state seems to pull ~16-20 Watts at idle at the wall for reference.

Moments later the state moves from D0 to D3Hot:

$ cat /sys/bus/pci/devices/0000\:01\:00.0/power_state 
D3Hot

The D3Hot state seems to pull ~30 Watts from the wall for reference.

Testing offload on battery power (intial boot on AC power -> battery)

Much to my satisfaction, unplugging the AC adapter did have great results. Unplugging the AC adapter changes the power_state from D3hot -> D0. Seconds later the power state eventually changes to D3cold. So that is good news. It appears NVreg_DynamicPowerManagement=0x02 is functioning as expected according to the documentation here. The laptop feels colder to the touch after switching to battery power.

Plugging back in the AC adapter (AC -> battery -> AC), the GPU power_state turns to D0 while running on AC power.

An oddity is even after offloading something to the NVIDIA GPU (__NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAME=nvidia glxgears), the state stays at D0.


I can live with this behavior. I'd still ideally want to keep the NVIDIA GPU powered off at all times, but using offload coupled with NVreg_DynamicPowerManagement=0x02 is acceptible. As long as maximum battery life is possible, which appears to be the case.

sndirsch commented 2 years ago

Sounds good. If you really want to have the NVIDIA GPU powered off at all times, I suggest to disable it in Firmware/BIOS - if possible. That could even save more engery. But this then begs the question why you bought such a laptop with nVidia GPU at all. Well, maybe it's just your company laptop - after all. ;-)

ryan-ronnander commented 2 years ago

Yeah, this is a new work laptop, hence the NVIDIA surprise. Unfortunately this laptop does not have the option to disable the NVIDIA GPU in the bios.

I went back and performed some more testing after reinstalled bbswitch in intel mode. All tests done with the 470 series and /etc/modprobe.d/09-nvidia-modprobe-bbswitch-G04.conf and '/etc/udev/rules.d/90-nvidia-udev-pm-G05.rules' present.

The following tests I was trying out various NVreg_DynamicPowerManagement options in /etc/modprobe.d/09-nvidia-modprobe-pm-G05.conf

Testing NVreg_DynamicPowerManagement=0x00

Laptop can be put back into D3Cold with AC power by running:

sudo tee /proc/acpi/bbswitch <<<OFF && \
sudo rmmod bbswitch && \
sudo modprobe bbswitch && \
sudo tee /proc/acpi/bbswitch <<<OFF

Testing NVreg_DynamicPowerManagement=0x01

Same behavior as 0x00.

Testing NVreg_DynamicPowerManagement=0x02

Same behavior as 0x00.

Testing no configured NVreg_DynamicPowerManagement option

Same behavior as 0x00.


I think this is how I'll keep the configuration. The laptop boots with the NVIDIA card powered off, the NVIDIA card stays off when switching to battery, and the card can be powered off again after reconnecting AC power by running:

sudo tee /proc/acpi/bbswitch <<<OFF && \
sudo rmmod bbswitch && \
sudo modprobe bbswitch && \
sudo tee /proc/acpi/bbswitch <<<OFF

I'll go ahead and close this issue, but I may open some more specific issues now that I have a better understanding of all the moving parts. I originally opened this issue using the 510 beta driver series which sent me down a crazy path (nvidia modules were being modprobe'd when switching to battery power, destroying battery life).

Thanks.