pop-os / pop

A project for managing all Pop!_OS sources
https://system76.com/pop
2.47k stars 87 forks source link

Suspend Not Working as IOMMU Fails to Resume #2118

Open pjkaufman opened 2 years ago

pjkaufman commented 2 years ago

Distribution (run cat /etc/os-release):

NAME="Pop!_OS"
VERSION="21.10"
ID=pop
ID_LIKE="ubuntu debian"
PRETTY_NAME="Pop!_OS 21.10"
VERSION_ID="21.10"
HOME_URL="https://pop.system76.com"
SUPPORT_URL="https://support.system76.com"
BUG_REPORT_URL="https://github.com/pop-os/pop/issues"
PRIVACY_POLICY_URL="https://system76.com/privacy"
VERSION_CODENAME=impish
UBUNTU_CODENAME=impish
LOGO=distributor-logo-pop-os

Related Application and/or Package Version (run apt policy $PACKAGE NAME):

The issue seems to reside in in the amdgpu though I am not 100% certain of this. Based on what I could find about the log, the amdgpu error does correlate to the issue with suspend and shutting my laptop not being able to start back up.

Issue/Bug Description:

When I shut my laptop or suspend the OS, I run into an issue where when I try to start using the previous user session, the screen either goes black and does nothing or I can see the screen that shows the time, but not my user and it just sits there not responding to keyboard input.

Steps to reproduce (if you know):

If I shut my laptop lid or specifically select suspend on the action I would like to take for my computer and then try to reuse the user session, it does not properly restore the session.

Expected behavior:

I should be able to log back in and start off where I left off in my session once I log back in.

Other Notes:

The issue has happened to me on 21.04 and 21.10.

I have more logs if you would like them, but the logs from journalctl -b -1 that seem to be the problem are:

Dec 24 08:55:56 pop-os kernel: Freezing user space processes ... (elapsed 0.003 seconds) done.
Dec 24 08:55:56 pop-os kernel: OOM killer disabled.
Dec 24 08:55:56 pop-os kernel: Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
Dec 24 08:55:56 pop-os kernel: printk: Suspending console(s) (use no_console_suspend to debug)
Dec 24 08:55:56 pop-os kernel: ACPI: EC: interrupt blocked
Dec 24 08:55:56 pop-os kernel: amd_pmc AMD0004:00: SMU cmd unknown. err: 0xfe
Dec 24 08:55:56 pop-os kernel: amd_pmc AMD0004:00: SMU cmd unknown. err: 0xfe
Dec 24 08:55:56 pop-os kernel: amd_pmc AMD0004:00: SMU cmd unknown. err: 0xfe
Dec 24 08:55:56 pop-os kernel: ACPI: EC: interrupt unblocked
Dec 24 08:55:56 pop-os kernel: ------------[ cut here ]------------
Dec 24 08:55:56 pop-os kernel: WARNING: CPU: 3 PID: 12655 at kernel/irq/chip.c:210 irq_startup+0x115/0x130
Dec 24 08:55:56 pop-os kernel: Modules linked in: rfcomm ccm cmac algif_hash algif_skcipher af_alg joydev bnep intel_rapl_msr intel_rapl_common edac_mce_amd kvm_amd kvm snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core snd_hwdep rapl uvcvideo snd_pcm videobuf2_vmalloc input_leds videobuf2_memops videobuf2_v4l2 videobuf2_common snd_seq_midi btusb rtw88_8822be btrtl snd_seq_midi_event rtw88_8822b hp_wmi videodev rtw88_pci btbcm platform_profile btintel mc serio_raw hid_multitouch snd_rawmidi rtw88_core sparse_keymap wmi_bmof bluetooth snd_seq ecdh_generic ip6t_REJECT ecc snd_seq_device efi_pstore k10temp nf_reject_ipv6 mac80211 nls_iso8859_1 snd_timer xt_hl snd_pci_acp5x ucsi_acpi snd ip6_tables cfg80211 snd_rn_pci_acp3x typec_ucsi snd_pci_acp3x ccp soundcore ip6t_rt libarc4 typec mac_hid ipt_REJECT amd_pmc nf_reject_ipv4 hid_sensor_accel_3d hid_sensor_magn_3d hid_sensor_gyro_3d xt_LOG acpi_tad
Dec 24 08:55:56 pop-os kernel:  hid_sensor_trigger nf_log_syslog hp_accel industrialio_triggered_buffer xt_multiport lis3lv02d wireless_hotkey kfifo_buf hid_sensor_iio_common nft_limit industrialio xt_limit xt_addrtype xt_tcpudp xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nft_counter sch_fq_codel nf_tables msr nfnetlink parport_pc ppdev lp parport ip_tables x_tables autofs4 dm_crypt raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear system76_io(OE) system76_acpi(OE) hid_logitech_hidpp hid_logitech_dj usbhid amdgpu iommu_v2 gpu_sched i2c_algo_bit drm_ttm_helper ttm drm_kms_helper syscopyarea hid_sensor_hub sysfillrect crct10dif_pclmul sysimgblt fb_sys_fops crc32_pclmul rtsx_pci_sdmmc hid_generic ghash_clmulni_intel cec rc_core aesni_intel crypto_simd nvme cryptd drm amd_sfh nvme_core xhci_pci i2c_piix4 xhci_pci_renesas rtsx_pci wmi i2c_hid_acpi i2c_hid video hid
Dec 24 08:55:56 pop-os kernel: CPU: 3 PID: 12655 Comm: systemd-sleep Tainted: G        W  OE     5.15.8-76051508-generic #202112141040~1639505278~21.10~0ede46a
Dec 24 08:55:56 pop-os kernel: Hardware name: HP HP ENVY x360 Convertible 15m-ds0xxx/85DD, BIOS F.20 05/28/2020
Dec 24 08:55:56 pop-os kernel: RIP: 0010:irq_startup+0x115/0x130
Dec 24 08:55:56 pop-os kernel: Code: f6 4c 89 ef e8 4c 42 00 00 85 c0 75 2b 4c 89 ef 31 d2 4c 89 f6 e8 db c3 ff ff 4c 89 e7 e8 83 fe ff ff 41 89 c5 e9 23 ff ff ff <0f> 0b e9 62 ff ff ff e8 0f c9 ff ff eb 98 0f 0b e9 54 ff ff ff 66
Dec 24 08:55:56 pop-os kernel: RSP: 0018:ffff9ef0c961bb88 EFLAGS: 00010002
Dec 24 08:55:56 pop-os kernel: RAX: 0000000000000010 RBX: 0000000000000001 RCX: ffffffffffffffff
Dec 24 08:55:56 pop-os kernel: RDX: 0000000000000000 RSI: ffffffff8d43a1a0 RDI: ffff92dfc0c771e8
Dec 24 08:55:56 pop-os kernel: RBP: ffff9ef0c961bba8 R08: 0000000000000000 R09: 0000000000000000
Dec 24 08:55:56 pop-os kernel: R10: 0000000000000010 R11: ffffffffffffffff R12: ffff92dfc12d9200
Dec 24 08:55:56 pop-os kernel: R13: 0000000000000001 R14: ffff92dfc0c771e8 R15: ffff92dfc12d92a4
Dec 24 08:55:56 pop-os kernel: FS:  00007fa2f2804fc0(0000) GS:ffff92e0d8ec0000(0000) knlGS:0000000000000000
Dec 24 08:55:56 pop-os kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 24 08:55:56 pop-os kernel: CR2: 00000a092d861d20 CR3: 000000010e99a000 CR4: 00000000003506e0
Dec 24 08:55:56 pop-os kernel: Call Trace:
Dec 24 08:55:56 pop-os kernel:  <TASK>
Dec 24 08:55:56 pop-os kernel:  __enable_irq+0x52/0x60
Dec 24 08:55:56 pop-os kernel:  resume_irqs+0xd3/0x120
Dec 24 08:55:56 pop-os kernel:  resume_device_irqs+0x10/0x20
Dec 24 08:55:56 pop-os kernel:  dpm_resume_noirq+0x13/0x20
Dec 24 08:55:56 pop-os kernel:  suspend_enter+0x13f/0x340
Dec 24 08:55:56 pop-os kernel:  suspend_devices_and_enter+0x12b/0x240
Dec 24 08:55:56 pop-os kernel:  enter_state+0x1d2/0x430
Dec 24 08:55:56 pop-os kernel:  pm_suspend+0x4e/0xc0
Dec 24 08:55:56 pop-os kernel:  state_store+0x81/0xe0
Dec 24 08:55:56 pop-os kernel:  kobj_attr_store+0x12/0x20
Dec 24 08:55:56 pop-os kernel:  sysfs_kf_write+0x3e/0x50
Dec 24 08:55:56 pop-os kernel:  kernfs_fop_write_iter+0x137/0x1c0
Dec 24 08:55:56 pop-os kernel:  new_sync_write+0x117/0x1a0
Dec 24 08:55:56 pop-os kernel:  vfs_write+0x1cd/0x260
Dec 24 08:55:56 pop-os kernel:  ksys_write+0x67/0xe0
Dec 24 08:55:56 pop-os kernel:  __x64_sys_write+0x19/0x20
Dec 24 08:55:56 pop-os kernel:  do_syscall_64+0x5c/0xc0
Dec 24 08:55:56 pop-os kernel:  ? __x64_sys_sendmsg+0x1d/0x20
Dec 24 08:55:56 pop-os kernel:  ? do_syscall_64+0x69/0xc0
Dec 24 08:55:56 pop-os kernel:  ? exit_to_user_mode_prepare+0x37/0xb0
Dec 24 08:55:56 pop-os kernel:  ? syscall_exit_to_user_mode+0x27/0x50
Dec 24 08:55:56 pop-os kernel:  ? __do_sys_gettid+0x1b/0x20
Dec 24 08:55:56 pop-os kernel:  ? do_syscall_64+0x69/0xc0
Dec 24 08:55:56 pop-os kernel:  ? syscall_exit_to_user_mode+0x27/0x50
Dec 24 08:55:56 pop-os kernel:  ? __x64_sys_close+0x11/0x40
Dec 24 08:55:56 pop-os kernel:  ? do_syscall_64+0x69/0xc0
Dec 24 08:55:56 pop-os kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
Dec 24 08:55:56 pop-os kernel: RIP: 0033:0x7fa2f307d9b7
Dec 24 08:55:56 pop-os kernel: Code: 10 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
Dec 24 08:55:56 pop-os kernel: RSP: 002b:00007ffe3f174878 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
Dec 24 08:55:56 pop-os kernel: RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007fa2f307d9b7
Dec 24 08:55:56 pop-os kernel: RDX: 0000000000000004 RSI: 00007ffe3f174930 RDI: 0000000000000004
Dec 24 08:55:56 pop-os kernel: RBP: 00007ffe3f174930 R08: 0000000000000004 R09: 0000000000000000
Dec 24 08:55:56 pop-os kernel: R10: 00007fa2f313a040 R11: 0000000000000246 R12: 0000000000000004
Dec 24 08:55:56 pop-os kernel: R13: 00005593e6a422d0 R14: 0000000000000004 R15: 00007fa2f317f960
Dec 24 08:55:56 pop-os kernel:  </TASK>
Dec 24 08:55:56 pop-os kernel: ---[ end trace cc5b92bcda275b56 ]---
Dec 24 08:55:56 pop-os kernel: kfd kfd: amdgpu: Failed to resume IOMMU for device 1002:15d8
Dec 24 08:55:56 pop-os kernel: amdgpu 0000:04:00.0: amdgpu: amdgpu_device_ip_resume failed (-6).
Dec 24 08:55:56 pop-os kernel: PM: dpm_run_callback(): pci_pm_resume+0x0/0xf0 returns -6
Dec 24 08:55:56 pop-os kernel: amdgpu 0000:04:00.0: PM: failed to resume async: error -6
Dec 24 08:55:56 pop-os kernel: usb 3-2: reset full-speed USB device number 3 using xhci_hcd
Dec 24 08:55:56 pop-os kernel: OOM killer enabled.
Dec 24 08:55:56 pop-os kernel: Restarting tasks ... done.

Please let me know if you have any suggestions or need more information about this issue. I know it can be hard to reproduce these kinds of issues.

Thanks for the help!

pjkaufman commented 2 years ago

Note: I also seem to get this message occasionally when modifying things via the terminal:

W: Possible missing firmware /lib/firmware/amdgpu/yellow_carp_gpu_info.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/vangogh_gpu_info.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/aldebaran_ta.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/aldebaran_sos.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/aldebaran_rlc.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/aldebaran_mec2.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/aldebaran_mec.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/cyan_skillfish2_rlc.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/cyan_skillfish2_mec2.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/cyan_skillfish2_mec.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/cyan_skillfish2_me.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/cyan_skillfish2_pfp.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/cyan_skillfish2_ce.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/cyan_skillfish_rlc.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/cyan_skillfish_mec2.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/cyan_skillfish_mec.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/cyan_skillfish_me.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/cyan_skillfish_pfp.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/cyan_skillfish_ce.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/aldebaran_sdma.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/cyan_skillfish2_sdma1.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/cyan_skillfish2_sdma.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/cyan_skillfish_sdma1.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/cyan_skillfish_sdma.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/sienna_cichlid_mes.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi10_mes.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/aldebaran_vcn.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/aldebaran_smc.bin for module amdgpu

This may or may not be an issue. Some links seem to indicate this warning is a non-issue: https://www.linux.org/threads/files-missing-in-lib-firmware-amdgpu-solved.30836/

pjkaufman commented 2 years ago

Following the following two guides I have been able to get hibernate working:

  1. https://abskmj.github.io/notes/posts/pop-os/enable-hibernate/
  2. https://www.youtube.com/watch?v=XckJEFTmxuM

Though I did see some weird behavior on my login attempt where I could not try to login until I went back to the user list and selected my user again (for some reason my password was filled in with spaces and I could not edit it).

Note: I see that adding hibernate as an option does not display it in the options in the drop down for actions to take. I will have to see if it is possible to do so.

Update: I can now see the option in the drop down after installing a gnome extension and following setup instructions found here