pop-os / pop

A project for managing all Pop!_OS sources
https://system76.com/pop
2.46k stars 87 forks source link

Resume from suspend fails with NVIDIA X11 #2465

Closed curiousercreative closed 1 year ago

curiousercreative commented 2 years ago

Distribution (run cat /etc/os-release):

$ cat /etc/os-release
NAME="Pop!_OS"
VERSION="22.04 LTS"
ID=pop
ID_LIKE="ubuntu debian"
PRETTY_NAME="Pop!_OS 22.04 LTS"
VERSION_ID="22.04"
HOME_URL="https://pop.system76.com"
SUPPORT_URL="https://support.system76.com"
BUG_REPORT_URL="https://github.com/pop-os/pop/issues"
PRIVACY_POLICY_URL="https://system76.com/privacy"
VERSION_CODENAME=jammy
UBUNTU_CODENAME=jammy
LOGO=distributor-logo-pop-os

Related Application and/or Package Version (run apt policy $PACKAGE NAME):

$ apt policy system76-driver-nvidia nvidia-driver-510 xorg
system76-driver-nvidia:
  Installed: 20.04.55~1653321476~22.04~09ff50e
  Candidate: 20.04.55~1653321476~22.04~09ff50e
  Version table:
 *** 20.04.55~1653321476~22.04~09ff50e 1001
       1001 http://apt.pop-os.org/release jammy/main amd64 Packages
       1001 http://apt.pop-os.org/release jammy/main i386 Packages
        100 /var/lib/dpkg/status
nvidia-driver-510:
  Installed: 510.73.05-1pop0~1653592955~22.04~4ec3405
  Candidate: 510.73.05-1pop0~1653592955~22.04~4ec3405
  Version table:
 *** 510.73.05-1pop0~1653592955~22.04~4ec3405 1001
       1001 http://apt.pop-os.org/release jammy/main amd64 Packages
        100 /var/lib/dpkg/status
     510.73.05-0ubuntu0.22.04.1 500
        500 http://us.archive.ubuntu.com/ubuntu jammy-security/restricted amd64 Packages
        500 http://us.archive.ubuntu.com/ubuntu jammy-updates/restricted amd64 Packages
     510.60.02-0ubuntu1 500
        500 http://us.archive.ubuntu.com/ubuntu jammy/restricted amd64 Packages
xorg:
  Installed: 1:7.7+23ubuntu2
  Candidate: 1:7.7+23ubuntu2
  Version table:
 *** 1:7.7+23ubuntu2 500
        500 http://us.archive.ubuntu.com/ubuntu jammy/main amd64 Packages
        100 /var/lib/dpkg/status
$ uname -r
5.17.5-76051705-generic

Issue/Bug Description: Graphical environment fails to resume from suspend

Steps to reproduce (if you know): I don't have 100% reliable repro steps unfortunately, but this will repro for me often

  1. Desktop w/ dual monitors connected to NVIDIA GPU (3080 Ti)
  2. Login, open work applications
  3. Suspend (manually, after timeout, whatever)
  4. Resume

Expected behavior: Desktop graphical environment and lock screen should reappear on both monitors in short time.

Observed behavior:

  1. both monitors are on and receiving display output, but only of a blank dark grey screen (not plymouth)
  2. System is fully responsive on the network, accepts SSH connections and can be rebooted
  3. No TTY hotkeys have an effect on blank display (alt+F1-10)
  4. From SSH, sudo killall gdm3 will bring the displays back to life, but logging in again fails and returns to login screen. Of course your prior graphical session will have been killed even if login succeed.

Other Notes: This appears to be a workaround (described here):

$ sudo systemctl disable nvidia-suspend.service
$ sudo systemctl disable nvidia-hibernate.service
$ sudo systemctl disable nvidia-resume.service
$ sudo systemctl reboot
Seibz commented 2 years ago

Hi there! If you disable bluetooth prior to suspending, is this still happening?

If disabling bluetooth doesn't help, have you experienced more stability in suspend after purging and reinstalling the nvidia drivers?

sudo rm /lib/systemd/system/nvidia-hibernate.service
sudo apt purge ~nnvidia
sudo apt autoremove
sudo apt clean
sudo apt install system76-driver-nvidia
curiousercreative commented 2 years ago

@Seibz I'm aware of the bluetooth suspend bug and this reproduces with Bluetooth disabled. I've also tried the reinstall as you desribe to no avail across two systems. The workaround at the end of my first post works though and the source link does a better job of describing what may be the issue in Pop.

curiousercreative commented 2 years ago

After updating NVIDIA driver to 515 via Pop! Shop, the workaround remains in effect (services disabled). I re-enabled them and the issue still reproduces. On my galp5, the resume screen shows this text, but is unresponsive to any keys:

ACPI Error: No handler or method for GPE 68, disabling event 20211217/evgpe-839)

Maybe it reads "GPE 6B", I can't tell from my photo.

ShahanM commented 2 years ago

I can confirm that I have the same problem with a Dell XPS 9500.

$ uname -r
5.18.10-76051810-generic

I have the Nvidia services disabled and also a script to switch off Bluetooth before suspending.

farfromrefug commented 2 years ago

Same issue here.

ShahanM commented 2 years ago

Update: I decided to enable the NVidia services (hibernate, suspend, resume), and my system seems to be working as expected with suspend and resume. Hibernate does not work - it gets stuck in the Dell boot logo when powering on, but it never worked for me.

General System Info:

OS: Pop!_OS 22.04 LTS x86_64                                                    
Host: XPS 15 9500                                                               
Kernel: 5.18.10-76051810-generic                                                
Uptime: 10 hours, 38 mins                                                       
Packages: 3383 (dpkg), 45 (flatpak)                                             
Shell: bash 5.1.16                                                              
Resolution: 1920x1200                                                           
DE: GNOME                                                                       
WM: Mutter                                                                      
WM Theme: Pop                                                                   
Theme: Pop-dark [GTK2/3]                                                        
Icons: candy-icons [GTK2/3]                                                     
Terminal: gnome-terminal                                                        
CPU: Intel i7-10750H (12) @ 5.000GHz                                            
GPU: NVIDIA GeForce GTX 1650 Ti Mobile                                          
GPU: Intel CometLake-H GT2 [UHD Graphics]                                       
Memory: 9564MiB / 15746MiB   

NVidia Drivers

NVIDIA-SMI 515.48.07    Driver Version: 515.48.07    CUDA Version: 11.7 

NVidia Service Status

I also included my last suspend and resume log.

○ nvidia-hibernate.service - NVIDIA system hibernate actions
     Loaded: loaded (/lib/systemd/system/nvidia-hibernate.service; enabled; vendor preset: enabled)
     Active: inactive (dead)

○ nvidia-suspend.service - NVIDIA system suspend actions
     Loaded: loaded (/lib/systemd/system/nvidia-suspend.service; enabled; vendor preset: enabled)
     Active: inactive (dead)

Aug 04 03:12:31 pop-xps systemd[1]: Starting NVIDIA system suspend actions...
Aug 04 03:12:31 pop-xps suspend[43752]: nvidia-suspend.service
Aug 04 03:12:31 pop-xps logger[43752]: <13>Aug  4 03:12:31 suspend: nvidia-suspend.service
Aug 04 03:12:33 pop-xps systemd[1]: nvidia-suspend.service: Deactivated successfully.
Aug 04 03:12:33 pop-xps systemd[1]: Finished NVIDIA system suspend actions.

○ nvidia-resume.service - NVIDIA system resume actions
     Loaded: loaded (/lib/systemd/system/nvidia-resume.service; enabled; vendor preset: enabled)
     Active: inactive (dead)

Aug 04 10:36:55 pop-xps systemd[1]: Starting NVIDIA system resume actions...
Aug 04 10:36:55 pop-xps suspend[44247]: nvidia-resume.service
Aug 04 10:36:55 pop-xps logger[44247]: <13>Aug  4 10:36:55 suspend: nvidia-resume.service
Aug 04 10:36:55 pop-xps systemd[1]: nvidia-resume.service: Deactivated successfully.
Aug 04 10:36:55 pop-xps systemd[1]: Finished NVIDIA system resume actions.

I had a problem if I closed my laptop lid while it was connected to power and the charging cable before opening the lid to resume; it crashed with the following error psmouse serio1: elantech: failed to query capabilities.

I added psmouse to the blacklist following this info on ArchWiki. So far, I haven't had any problems. It has been about four days now.

curiousercreative commented 2 years ago

@ShahanM A default Pop! install does not support hibernate, so unless you created your swap partition and added the resume kernel boot option, it should not be expected to work. On my laptop, I have hibernation configured and it works well. No reason to hibernate my desktop.

Hibernate does not work - it gets stuck in the Dell boot logo when powering on, but it never worked for me.

ShahanM commented 2 years ago

@curiousercreative

so, unless you created your swap partition and added the resume kernel boot option, it should not be expected to work.

I did, and I couldn't get it to work reliably. I tried in Winter 2020, so I do not recall what I tried. It wasn't a complaint, though. I was being thorough about the 3 Nvidia services.

curiousercreative commented 1 year ago

On the latest Pop! and related packages and with these services enabled, I don't have this problem anymore, so will close.

drager commented 1 year ago

On the latest Pop! and related packages and with these services enabled, I don't have this problem anymore, so will close.

I still have this issue... Always happens if I have had plugged out a external monitor and then tries to wake up the laptop.

Running a System 76 Oryx Pro with nvidia graphics. PopOS 22.04 LTS.

Areopagitics commented 1 year ago

On the latest Pop! and related packages and with these services enabled, I don't have this problem anymore, so will close.

I still have this issue... Always happens if I have had plugged out a external monitor and then tries to wake up the laptop.

Running a System 76 Oryx Pro with nvidia graphics. PopOS 22.04 LTS.

happens both when plugged in to external screen and when not

Areopagitics commented 1 year ago

there are sooooo many related issues on suspend #449,#2616, #1799 I sometimes wonder if PopOs is more of a hobby for the developers at System76

it looks like it has to do with these error codes (screenshot with Gnome Logs):

i915 0000:00:02.0: [drm] ERROR Failed to write source OUI

image

regulator-g commented 1 year ago

there are sooooo many related issues on suspend #449,#2616, #1799 I sometimes wonder if PopOs is more of a hobby for the developers at System76

I think this is more likely an upstream issue that nvidia need to fix

leviport commented 1 year ago

I sometimes wonder if PopOs is more of a hobby for the developers at System76

Your statement makes no sense. System76 employs developers to work on Pop!_OS, so how could it be a hobby?

Furthermore, I'll remind you that we have a code of conduct. While I understand how frustrating it can be to deal with some of these bugs, I must insist that you remain respectful.

aalvarado commented 1 year ago

https://gitlab.gnome.org/GNOME/gnome-settings-daemon/-/issues/111#note_1254235

The change in the above link solved laptop suspend issues when using a second screen and closing the lid. Would wake up to a black screen and could only hard power down with the power button or alt + print scr + reisub

aalvarado commented 1 year ago

Nevermind, got a black screen again that I had to Reisub out of. Will revisit this if it gets fixed. Tried cleaning up my kernelstub and checking to see if this has any effect.