pop-os / system76-power

Power profile management for Linux
GNU General Public License v3.0
584 stars 72 forks source link

Suspend broken on Gazelle 16 because of Nvidia GPU #358

Open JRDetwiler opened 2 years ago

JRDetwiler commented 2 years ago

Distribution (run cat /etc/os-release): Arch Linux (5.19.4 / Nvidia driver 515.65.01)

Related Application and/or Package Version (run apt policy $PACKAGE NAME): Related to using xfce4-session-logout --suspend with the Whisker Menu.

Issue/Bug Description: The system wakes up immediately after suspending. journalctl -p 3 -b reported logs that looked like these.

Steps to reproduce (if you know): You should able to run the command above to trigger it.

Expected behavior: The machine doesn't immediately wake up after screen tears.

Other Notes: I ended up finding my fix here.

Known workaround:

I went into /etc/modprobe.d/system76-power.conf and modified this setting to be off instead of on.

options nvidia NVreg_PreserveVideoMemoryAllocations=0

Proposed fix:

I think something in src/graphics.rs needs to be updated, possibly the S3 exception where this setting is being applied. That's the extent of the debugging I'm ready to put into this though, I hope it helps someone else.

agherzan commented 1 year ago

Here are the relevant errors (oryx6 on my side):

[   83.615220] NVRM: GPU 0000:01:00.0: PreserveVideoMemoryAllocations module parameter is set. System Power Management attempted without driver procfs suspend interface. Please refer to the 'Configuring Power Management Support' section in the driver README.
[   86.105347] nvidia 0000:01:00.0: PM: pci_pm_suspend(): nv_pmops_suspend+0x0/0x30 [nvidia] returns -5
[   86.106085] nvidia 0000:01:00.0: PM: dpm_run_callback(): pci_pm_suspend+0x0/0x160 returns -5
[   86.106102] nvidia 0000:01:00.0: PM: failed to suspend async: error -5
[   86.690666] PM: Some devices failed to suspend, or early wake event detected
agherzan commented 1 year ago

There is also a related known issue on the Nvidia side: https://download.nvidia.com/XFree86/Linux-x86_64/470.86/README/powermanagement.html#KnownIssuesAndWf438e

regulator-g commented 1 year ago

Just want to say I have the same issue, I think quite a few people do, thanks for the workaround its not perfect but laptop is no longer frozen on resume

JRDetwiler commented 1 year ago

Okay, here's the actual correct fix. Keep the gpu memory (/etc/modprobe.d/system76-power.conf):

options nvidia NVreg_PreserveVideoMemoryAllocations=1

Instead, enable these NVIDIA services which are disabled by default. I rebooted and it's finally working as expected with no side effects.

sudo systemctl enable nvidia-suspend.service
sudo systemctl enable nvidia-hibernate.service

As noted in the Arch wiki and NVIDIA's documentation, this saves gpu memory to a tmpfs, in /tmp by default. If your /tmp is small, this might've been the real issue in the first place. The config can be updated to dump to a larger, faster filesystem, likely resolving that issue:

options nvidia NVreg_PreserveVideoMemoryAllocations=1 NVreg_TemporaryFilePath=/path/to/tmp-nvidia

If someone can turn this into a pull request, go ahead. Maybe the system76-power graphics nvidia command isn't enabling these necessary services, or possibly not checking that /tmp is sufficiently big enough.