pop-os / pop

A project for managing all Pop!_OS sources
https://system76.com/pop
2.38k stars 81 forks source link

[ASRock A320M-HDV R4.0, BIOS P2.30] Pop!_OS 21.10 freezes on amdgpu randomly #2132

Open Dark-Matter7232 opened 2 years ago

Dark-Matter7232 commented 2 years ago

Distribution (run cat /etc/os-release): NAME="Pop!_OS" VERSION="21.10" ID=pop ID_LIKE="ubuntu debian" PRETTY_NAME="Pop!_OS 21.10" VERSION_ID="21.10" HOME_URL="https://pop.system76.com" SUPPORT_URL="https://support.system76.com" BUG_REPORT_URL="https://github.com/pop-os/pop/issues" PRIVACY_POLICY_URL="https://system76.com/privacy" VERSION_CODENAME=impish UBUNTU_CODENAME=impish LOGO=distributor-logo-pop-os

Related Application and/or Package Version (run apt policy $PACKAGE NAME): N/A

Issue/Bug Description: System randomly freezes(everything gets locked out including keyboard), cursor still moves but I can't click on anything, hard reset is the only way to get the system working again.

Steps to reproduce (if you know): It's very random

Expected behavior: System to work as intended

Other Notes: I have provided both trimmed and full version of the journalctl log

Trimmed log:

Dec 30 07:07:03 pop-os kernel: UBSAN: invalid-load in /build/linux-EAUjmG/linux-5.15.8/drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:5882:84
Dec 30 07:07:03 pop-os kernel: load of value 141 is not a valid value for type '_Bool'
Dec 30 07:07:03 pop-os kernel: CPU: 1 PID: 1700 Comm: Xorg Tainted: G         C OE     5.15.8-76051508-generic #202112141040~1639505278~21.10~0ede46a
Dec 30 07:07:03 pop-os kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./A320M-HDV R4.0, BIOS P2.30 06/26/2019
Dec 30 07:07:03 pop-os kernel: Call Trace:
Dec 30 07:07:03 pop-os kernel:  
Dec 30 07:07:03 pop-os kernel:  show_stack+0x52/0x58
Dec 30 07:07:03 pop-os kernel:  dump_stack_lvl+0x4a/0x5f
Dec 30 07:07:03 pop-os kernel:  dump_stack+0x10/0x12
Dec 30 07:07:03 pop-os kernel:  ubsan_epilogue+0x9/0x45
Dec 30 07:07:03 pop-os kernel:  __ubsan_handle_load_invalid_value.cold+0x44/0x49
Dec 30 07:07:03 pop-os kernel:  create_stream_for_sink.cold+0x5d/0xbb [amdgpu]
Dec 30 07:07:03 pop-os kernel:  ? get_page_from_freelist+0x33d/0x520
Dec 30 07:07:03 pop-os kernel:  create_validate_stream_for_sink+0x59/0x150 [amdgpu]
Dec 30 07:07:03 pop-os kernel:  dm_update_crtc_state+0x235/0x7b0 [amdgpu]
Dec 30 07:07:03 pop-os kernel:  amdgpu_dm_atomic_check+0x596/0xcd0 [amdgpu]
Dec 30 07:07:03 pop-os kernel:  ? dm_plane_format_mod_supported+0x1f/0x100 [amdgpu]
Dec 30 07:07:03 pop-os kernel:  ? drm_plane_check_pixel_format+0x45/0x90 [drm]
Dec 30 07:07:03 pop-os kernel:  ? drm_atomic_plane_check+0x12f/0x360 [drm]
Dec 30 07:07:03 pop-os kernel:  drm_atomic_check_only+0x253/0x4b0 [drm]
Dec 30 07:07:03 pop-os kernel:  ? handle_conflicting_encoders+0x26b/0x2a0 [drm_kms_helper]
Dec 30 07:07:03 pop-os kernel:  drm_atomic_commit+0x18/0x50 [drm]
Dec 30 07:07:03 pop-os kernel:  drm_atomic_helper_set_config+0x7c/0xc0 [drm_kms_helper]
Dec 30 07:07:03 pop-os kernel:  drm_mode_setcrtc+0x1f9/0x7c0 [drm]
Dec 30 07:07:03 pop-os kernel:  ? drm_mode_getcrtc+0x1c0/0x1c0 [drm]
Dec 30 07:07:03 pop-os kernel:  drm_ioctl_kernel+0xae/0xf0 [drm]
Dec 30 07:07:03 pop-os kernel:  drm_ioctl+0x264/0x4b0 [drm]
Dec 30 07:07:03 pop-os kernel:  ? drm_mode_getcrtc+0x1c0/0x1c0 [drm]
Dec 30 07:07:03 pop-os kernel:  amdgpu_drm_ioctl+0x4e/0x80 [amdgpu]
Dec 30 07:07:03 pop-os kernel:  __x64_sys_ioctl+0x91/0xc0
Dec 30 07:07:03 pop-os kernel:  do_syscall_64+0x5c/0xc0
Dec 30 07:07:03 pop-os kernel:  ? do_syscall_64+0x69/0xc0
Dec 30 07:07:03 pop-os kernel:  ? fput+0x13/0x20
Dec 30 07:07:03 pop-os kernel:  ? exit_to_user_mode_prepare+0x37/0xb0
Dec 30 07:07:03 pop-os kernel:  ? syscall_exit_to_user_mode+0x27/0x50
Dec 30 07:07:03 pop-os kernel:  ? do_syscall_64+0x69/0xc0
Dec 30 07:07:03 pop-os kernel:  ? exit_to_user_mode_prepare+0x37/0xb0
Dec 30 07:07:03 pop-os kernel:  ? syscall_exit_to_user_mode+0x27/0x50
Dec 30 07:07:03 pop-os kernel:  ? do_syscall_64+0x69/0xc0
Dec 30 07:07:03 pop-os kernel:  ? __do_sys_getpid+0x1e/0x30
Dec 30 07:07:03 pop-os kernel:  ? do_syscall_64+0x69/0xc0
Dec 30 07:07:03 pop-os kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
Dec 30 07:07:03 pop-os kernel: RIP: 0033:0x7f2601d2e9cb
Dec 30 07:07:03 pop-os kernel: Code: ff ff ff 85 c0 79 8b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 35 a4 0f 00 f7 d8 64 89 01 48
Dec 30 07:07:03 pop-os kernel: RSP: 002b:00007ffdbb3f8998 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Dec 30 07:07:03 pop-os kernel: RAX: ffffffffffffffda RBX: 00007ffdbb3f89d0 RCX: 00007f2601d2e9cb
Dec 30 07:07:03 pop-os kernel: RDX: 00007ffdbb3f89d0 RSI: 00000000c06864a2 RDI: 000000000000000f
Dec 30 07:07:03 pop-os kernel: RBP: 00000000c06864a2 R08: 0000000000000000 R09: 000055a161457c30
Dec 30 07:07:03 pop-os kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
Dec 30 07:07:03 pop-os kernel: R13: 000000000000000f R14: 000055a161457c30 R15: 0000000000000000
Dec 30 07:07:03 pop-os kernel:

Dec 30 07:07:03 pop-os kernel: ================================================================================
Dec 30 08:26:34 pop-os kernel: BUG: unable to handle page fault for address: fffffffffffffff8
Dec 30 08:26:34 pop-os kernel: #PF: supervisor read access in kernel mode
Dec 30 08:26:34 pop-os kernel: #PF: error_code(0x0000) - not-present page
Dec 30 08:26:34 pop-os kernel: PGD 132615067 P4D 132615067 PUD 132617067 PMD 0 
Dec 30 08:26:34 pop-os kernel: Oops: 0000 [#1] SMP NOPTI
Dec 30 08:26:34 pop-os kernel: CPU: 3 PID: 248 Comm: uvd Tainted: G         C OE     5.15.8-76051508-generic #202112141040~1639505278~21.10~0ede46a
Dec 30 08:26:34 pop-os kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./A320M-HDV R4.0, BIOS P2.30 06/26/2019
Dec 30 08:26:34 pop-os kernel: RIP: 0010:swake_up_locked+0x1b/0x40
Dec 30 08:26:34 pop-os kernel: Code: 0f 84 58 ff ff ff eb ad 0f 1f 80 00 00 00 00 0f 1f 44 00 00 48 8b 57 08 48 8d 47 08 48 39 c2 74 2d 55 48 89 e5 53 48 8b 5f 08 <48> 8b 7b f8 e8 8c db fd ff 48 8b 13 48 8b 43 08 48 89 42 08 48 89
Dec 30 08:26:34 pop-os kernel: RSP: 0018:ffffbc7880fefe58 EFLAGS: 00010007
Dec 30 08:26:34 pop-os kernel: RAX: ffffa035ea6bdeb0 RBX: 0000000000000000 RCX: 0000000017e00003
Dec 30 08:26:34 pop-os kernel: RDX: 0000000000000000 RSI: ffffa035ee095a30 RDI: ffffa035ea6bdea8
Dec 30 08:26:34 pop-os kernel: RBP: ffffbc7880fefe60 R08: 0000000000000001 R09: ffffa0351922fff0
Dec 30 08:26:34 pop-os kernel: R10: ffffa03519017000 R11: ffffa03519017000 R12: ffffa035ea6bdea8
Dec 30 08:26:34 pop-os kernel: R13: 0000000000000282 R14: ffffa035ee095800 R15: ffffa035ea6bdea0
Dec 30 08:26:34 pop-os kernel: FS:  0000000000000000(0000) GS:ffffa0380eac0000(0000) knlGS:0000000000000000
Dec 30 08:26:34 pop-os kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 30 08:26:34 pop-os kernel: CR2: fffffffffffffff8 CR3: 0000000132610000 CR4: 00000000003506e0
Dec 30 08:26:34 pop-os kernel: Call Trace:
Dec 30 08:26:34 pop-os kernel:  
Dec 30 08:26:34 pop-os kernel:  complete+0x34/0x50
Dec 30 08:26:34 pop-os kernel:  drm_sched_main+0x1be/0x420 [gpu_sched]
Dec 30 08:26:34 pop-os kernel:  ? wait_woken+0x70/0x70
Dec 30 08:26:34 pop-os kernel:  kthread+0x11e/0x140
Dec 30 08:26:34 pop-os kernel:  ? drm_sched_select_entity+0xf0/0xf0 [gpu_sched]
Dec 30 08:26:34 pop-os kernel:  ? set_kthread_struct+0x50/0x50
Dec 30 08:26:34 pop-os kernel:  ret_from_fork+0x22/0x30
Dec 30 08:26:34 pop-os kernel:  
Dec 30 08:26:34 pop-os kernel: Modules linked in: ntfs3 cfg80211 snd_hda_codec_realtek snd_hda_codec_generic nls_iso8859_1 ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi intel_rapl_msr intel_rapl_common snd_hda_codec snd_hda_core edac_mce_amd snd_hwdep kvm_amd snd_pcm kvm snd_seq_midi snd_seq_midi_event snd_rawmidi rapl snd_seq efi_pstore wmi_bmof joydev r8188eu(C) snd_seq_device input_leds k10temp snd_timer ccp snd soundcore mac_hid sch_fq_codel msr parport_pc ppdev lp parport binfmt_misc ip_tables x_tables autofs4 dm_crypt raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear system76_io(OE) system76_acpi(OE) hid_generic usbhid hid amdgpu iommu_v2 crct10dif_pclmul gpu_sched i2c_algo_bit crc32_pclmul ghash_clmulni_intel drm_ttm_helper ttm aesni_intel crypto_simd drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec cryptd rc_core drm xhci_pci i2c_piix4 r8169 realtek xhci_pci_renesas ahci
Dec 30 08:26:34 pop-os kernel:  libahci wmi gpio_amdpt video gpio_generic
Dec 30 08:26:34 pop-os kernel: CR2: fffffffffffffff8
Dec 30 08:26:34 pop-os kernel: ---[ end trace ebd42068923ce1dc ]---
Dec 30 08:26:34 pop-os kernel: BUG: kernel NULL pointer dereference, address: 0000000000000259
Dec 30 08:26:34 pop-os kernel: #PF: supervisor read access in kernel mode
Dec 30 08:26:35 pop-os kernel: #PF: error_code(0x0000) - not-present page
Dec 30 08:26:35 pop-os kernel: PGD 10c44b067 P4D 10c44b067 PUD 1f70de067 PMD 0 
Dec 30 08:26:35 pop-os kernel: Oops: 0000 [#2] SMP NOPTI
Dec 30 08:26:35 pop-os kernel: CPU: 0 PID: 9404 Comm: kworker/u64:5 Tainted: G      D  C OE     5.15.8-76051508-generic #202112141040~1639505278~21.10~0ede46a
Dec 30 08:26:35 pop-os kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./A320M-HDV R4.0, BIOS P2.30 06/26/2019
Dec 30 08:26:35 pop-os kernel: Workqueue: events_unbound commit_work [drm_kms_helper]
Dec 30 08:26:35 pop-os kernel: RIP: 0010:drm_atomic_helper_cleanup_planes+0x37/0x70 [drm_kms_helper]
Dec 30 08:26:35 pop-os kernel: Code: 00 00 85 c0 7e 5a 55 48 89 e5 41 54 49 89 f4 53 31 db 48 63 c3 48 c1 e0 05 49 03 44 24 18 48 8b 38 48 85 ff 74 2a 48 8b 70 10 <48> 39 b7 58 02 00 00 48 0f 44 70 18 48 8b 87 50 02 00 00 48 8b 40
Dec 30 08:26:35 pop-os kernel: RSP: 0018:ffffbc788361bb40 EFLAGS: 00010202
Dec 30 08:26:35 pop-os kernel: RAX: ffffa035ea6bdea0 RBX: 0000000000000005 RCX: ffffa03509c4e901
Dec 30 08:26:35 pop-os kernel: RDX: ffffa03519220010 RSI: 0000000000000000 RDI: 0000000000000001
Dec 30 08:26:35 pop-os kernel: RBP: ffffbc788361bb50 R08: ffffa03509c4e9b8 R09: 0000000000000000
Dec 30 08:26:35 pop-os kernel: R10: 0000000000000002 R11: 0000000000000001 R12: ffffa03507615580
Dec 30 08:26:35 pop-os kernel: R13: 0000000000000001 R14: ffffa03519220010 R15: 0000000000000005
Dec 30 08:26:35 pop-os kernel: FS:  0000000000000000(0000) GS:ffffa0380ea00000(0000) knlGS:0000000000000000
Dec 30 08:26:35 pop-os kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 30 08:26:35 pop-os kernel: CR2: 0000000000000259 CR3: 000000010e06c000 CR4: 00000000003506f0
Dec 30 08:26:35 pop-os kernel: Call Trace:
Dec 30 08:26:35 pop-os kernel:  
Dec 30 08:26:35 pop-os kernel:  amdgpu_dm_atomic_commit_tail+0xee2/0x1440 [amdgpu]
Dec 30 08:26:35 pop-os kernel:  ? load_balance+0x154/0x7e0
Dec 30 08:26:35 pop-os kernel:  ? sched_clock+0x9/0x10
Dec 30 08:26:35 pop-os kernel:  ? sched_clock_cpu+0x12/0xf0
Dec 30 08:26:35 pop-os kernel:  ? raw_spin_rq_lock_nested+0x17/0x70
Dec 30 08:26:35 pop-os kernel:  ? newidle_balance+0x319/0x470
Dec 30 08:26:35 pop-os kernel:  ? __cond_resched+0x1a/0x50
Dec 30 08:26:35 pop-os kernel:  ? __wait_for_common+0x3e/0x150
Dec 30 08:26:35 pop-os kernel:  ? usleep_range_state+0x90/0x90
Dec 30 08:26:35 pop-os kernel:  ? wait_for_completion_timeout+0x1d/0x20
Dec 30 08:26:35 pop-os kernel:  commit_tail+0xc5/0x170 [drm_kms_helper]
Dec 30 08:26:35 pop-os kernel:  commit_work+0x12/0x20 [drm_kms_helper]
Dec 30 08:26:35 pop-os kernel:  process_one_work+0x22b/0x3d0
Dec 30 08:26:35 pop-os kernel:  worker_thread+0x53/0x420
Dec 30 08:26:35 pop-os kernel:  kthread+0x11e/0x140
Dec 30 08:26:35 pop-os kernel:  ? process_one_work+0x3d0/0x3d0
Dec 30 08:26:35 pop-os kernel:  ? set_kthread_struct+0x50/0x50
Dec 30 08:26:35 pop-os kernel:  ret_from_fork+0x22/0x30
Dec 30 08:26:35 pop-os kernel:  
Dec 30 08:26:35 pop-os kernel: Modules linked in: ntfs3 cfg80211 snd_hda_codec_realtek snd_hda_codec_generic nls_iso8859_1 ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi intel_rapl_msr intel_rapl_common snd_hda_codec snd_hda_core edac_mce_amd snd_hwdep kvm_amd snd_pcm kvm snd_seq_midi snd_seq_midi_event snd_rawmidi rapl snd_seq efi_pstore wmi_bmof joydev r8188eu(C) snd_seq_device input_leds k10temp snd_timer ccp snd soundcore mac_hid sch_fq_codel msr parport_pc ppdev lp parport binfmt_misc ip_tables x_tables autofs4 dm_crypt raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear system76_io(OE) system76_acpi(OE) hid_generic usbhid hid amdgpu iommu_v2 crct10dif_pclmul gpu_sched i2c_algo_bit crc32_pclmul ghash_clmulni_intel drm_ttm_helper ttm aesni_intel crypto_simd drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec cryptd rc_core drm xhci_pci i2c_piix4 r8169 realtek xhci_pci_renesas ahci
Dec 30 08:26:35 pop-os kernel:  libahci wmi gpio_amdpt video gpio_generic
Dec 30 08:26:35 pop-os kernel: CR2: 0000000000000259
Dec 30 08:26:35 pop-os kernel: ---[ end trace ebd42068923ce1dd ]---
Dec 30 08:26:35 pop-os kernel: RIP: 0010:swake_up_locked+0x1b/0x40
Dec 30 08:26:35 pop-os kernel: Code: 0f 84 58 ff ff ff eb ad 0f 1f 80 00 00 00 00 0f 1f 44 00 00 48 8b 57 08 48 8d 47 08 48 39 c2 74 2d 55 48 89 e5 53 48 8b 5f 08 <48> 8b 7b f8 e8 8c db fd ff 48 8b 13 48 8b 43 08 48 89 42 08 48 89
Dec 30 08:26:35 pop-os kernel: RSP: 0018:ffffbc7880fefe58 EFLAGS: 00010007
Dec 30 08:26:35 pop-os kernel: RAX: ffffa035ea6bdeb0 RBX: 0000000000000000 RCX: 0000000017e00003
Dec 30 08:26:35 pop-os kernel: RDX: 0000000000000000 RSI: ffffa035ee095a30 RDI: ffffa035ea6bdea8
Dec 30 08:26:35 pop-os kernel: RBP: ffffbc7880fefe60 R08: 0000000000000001 R09: ffffa0351922fff0
Dec 30 08:26:35 pop-os kernel: R10: ffffa03519017000 R11: ffffa03519017000 R12: ffffa035ea6bdea8
Dec 30 08:26:35 pop-os kernel: R13: 0000000000000282 R14: ffffa035ee095800 R15: ffffa035ea6bdea0
Dec 30 08:26:35 pop-os kernel: FS:  0000000000000000(0000) GS:ffffa0380eac0000(0000) knlGS:0000000000000000
Dec 30 08:26:35 pop-os kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 30 08:26:35 pop-os kernel: CR2: fffffffffffffff8 CR3: 00000001f2258000 CR4: 00000000003506e0
Dec 30 08:26:35 pop-os kernel: RIP: 0010:swake_up_locked+0x1b/0x40
Dec 30 08:26:35 pop-os kernel: Code: 0f 84 58 ff ff ff eb ad 0f 1f 80 00 00 00 00 0f 1f 44 00 00 48 8b 57 08 48 8d 47 08 48 39 c2 74 2d 55 48 89 e5 53 48 8b 5f 08 <48> 8b 7b f8 e8 8c db fd ff 48 8b 13 48 8b 43 08 48 89 42 08 48 89
Dec 30 08:26:35 pop-os kernel: RSP: 0018:ffffbc7880fefe58 EFLAGS: 00010007
Dec 30 08:26:35 pop-os kernel: RAX: ffffa035ea6bdeb0 RBX: 0000000000000000 RCX: 0000000017e00003
Dec 30 08:26:35 pop-os kernel: RDX: 0000000000000000 RSI: ffffa035ee095a30 RDI: ffffa035ea6bdea8
Dec 30 08:26:35 pop-os kernel: RBP: ffffbc7880fefe60 R08: 0000000000000001 R09: ffffa0351922fff0
Dec 30 08:26:35 pop-os kernel: R10: ffffa03519017000 R11: ffffa03519017000 R12: ffffa035ea6bdea8
Dec 30 08:26:35 pop-os kernel: R13: 0000000000000282 R14: ffffa035ee095800 R15: ffffa035ea6bdea0
Dec 30 08:26:35 pop-os kernel: FS:  0000000000000000(0000) GS:ffffa0380ea00000(0000) knlGS:0000000000000000
Dec 30 08:26:35 pop-os kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 30 08:26:35 pop-os kernel: CR2: 00007f9014c6a000 CR3: 00000001ee080000 CR4: 00000000003506f0
Dec 30 08:26:44 pop-os kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring uvd timeout, signaled seq=43247, emitted seq=43247
Dec 30 08:26:44 pop-os kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process brave pid 7138 thread brave:cs0 pid 7158
Dec 30 08:26:44 pop-os kernel: amdgpu 0000:01:00.0: amdgpu: GPU reset begin!
Dec 30 08:26:44 pop-os kernel: ------------[ cut here ]------------
Dec 30 08:26:44 pop-os kernel: WARNING: CPU: 3 PID: 6583 at kernel/kthread.c:596 kthread_park+0x79/0xa0
Dec 30 08:26:44 pop-os kernel: Modules linked in: ntfs3 cfg80211 snd_hda_codec_realtek snd_hda_codec_generic nls_iso8859_1 ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi intel_rapl_msr intel_rapl_common snd_hda_codec snd_hda_core edac_mce_amd snd_hwdep kvm_amd snd_pcm kvm snd_seq_midi snd_seq_midi_event snd_rawmidi rapl snd_seq efi_pstore wmi_bmof joydev r8188eu(C) snd_seq_device input_leds k10temp snd_timer ccp snd soundcore mac_hid sch_fq_codel msr parport_pc ppdev lp parport binfmt_misc ip_tables x_tables autofs4 dm_crypt raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear system76_io(OE) system76_acpi(OE) hid_generic usbhid hid amdgpu iommu_v2 crct10dif_pclmul gpu_sched i2c_algo_bit crc32_pclmul ghash_clmulni_intel drm_ttm_helper ttm aesni_intel crypto_simd drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec cryptd rc_core drm xhci_pci i2c_piix4 r8169 realtek xhci_pci_renesas ahci
Dec 30 08:26:44 pop-os kernel:  libahci wmi gpio_amdpt video gpio_generic
Dec 30 08:26:44 pop-os kernel: CPU: 3 PID: 6583 Comm: kworker/3:0 Tainted: G      D  C OE     5.15.8-76051508-generic #202112141040~1639505278~21.10~0ede46a
Dec 30 08:26:44 pop-os kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./A320M-HDV R4.0, BIOS P2.30 06/26/2019
Dec 30 08:26:44 pop-os kernel: Workqueue: events drm_sched_job_timedout [gpu_sched]
Dec 30 08:26:44 pop-os kernel: RIP: 0010:kthread_park+0x79/0xa0
Dec 30 08:26:44 pop-os kernel: Code: be 40 00 00 00 4c 89 e7 e8 b4 9f 01 00 48 85 c0 74 2d 31 c0 5b 41 5c 5d c3 0f 0b 8b 47 2c 49 8b 9c 24 28 0a 00 00 a8 04 74 ac <0f> 0b b8 da ff ff ff 5b 41 5c 5d c3 0f 0b b8 f0 ff ff ff eb d5 0f
Dec 30 08:26:44 pop-os kernel: RSP: 0018:ffffbc7883353cc0 EFLAGS: 00010202
Dec 30 08:26:44 pop-os kernel: RAX: 0000000000208044 RBX: ffffa03500a35580 RCX: 0000000000000000
Dec 30 08:26:44 pop-os kernel: RDX: 0000000000000000 RSI: ffffa035ee095800 RDI: ffffa03518ba0000
Dec 30 08:26:44 pop-os kernel: RBP: ffffbc7883353cd0 R08: 0000000000000000 R09: 0000000000000000
Dec 30 08:26:44 pop-os kernel: R10: 0000000000000000 R11: 0000000000000001 R12: ffffa03518ba0000
Dec 30 08:26:44 pop-os kernel: R13: 000000000000000c R14: ffffa03519220000 R15: ffffa035ee095800
Dec 30 08:26:44 pop-os kernel: FS:  0000000000000000(0000) GS:ffffa0380eac0000(0000) knlGS:0000000000000000
Dec 30 08:26:44 pop-os kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 30 08:26:44 pop-os kernel: CR2: 00000c4a02dd0000 CR3: 00000001169bc000 CR4: 00000000003506e0
Dec 30 08:26:44 pop-os kernel: Call Trace:
Dec 30 08:26:44 pop-os kernel:  
Dec 30 08:26:44 pop-os kernel:  drm_sched_stop+0x2c/0x170 [gpu_sched]
Dec 30 08:26:44 pop-os kernel:  ? drm_fb_helper_set_suspend_unlocked+0x8d/0xa0 [drm_kms_helper]
Dec 30 08:26:44 pop-os kernel:  amdgpu_device_gpu_recover.cold+0x84b/0x8df [amdgpu]
Dec 30 08:26:44 pop-os kernel:  amdgpu_job_timedout+0x14f/0x170 [amdgpu]
Dec 30 08:26:44 pop-os kernel:  drm_sched_job_timedout+0x6f/0x110 [gpu_sched]
Dec 30 08:26:44 pop-os kernel:  process_one_work+0x22b/0x3d0
Dec 30 08:26:44 pop-os kernel:  worker_thread+0x53/0x420
Dec 30 08:26:44 pop-os kernel:  kthread+0x11e/0x140
Dec 30 08:26:44 pop-os kernel:  ? process_one_work+0x3d0/0x3d0
Dec 30 08:26:44 pop-os kernel:  ? set_kthread_struct+0x50/0x50
Dec 30 08:26:44 pop-os kernel:  ret_from_fork+0x22/0x30
Dec 30 08:26:44 pop-os kernel:  
Dec 30 08:26:44 pop-os kernel: ---[ end trace ebd42068923ce1de ]---

Dec 30 08:27:00 pop-os gnome-shell[7101]: [1230/082700.256592:WARNING:exception_snapshot_linux.cc(427)] Unhandled signal -1
Dec 30 08:27:02 pop-os gnome-shell[7101]: [1230/082702.011389:ERROR:file_io_posix.cc(152)] open /home/const/.config/BraveSoftware/Brave-Browser-Beta/Crash Reports/pending/1e1c8f4b-b804-44e7-aed2-508c0c191afd.lock: File exists (17)
Dec 30 08:28:56 pop-os kernel: [drm:amdgpu_dm_commit_planes [amdgpu]] *ERROR* Waiting for fences timed out!
Dec 30 08:28:56 pop-os kernel: pcieport 0000:00:08.1: PME: Spurious native interrupt!
Dec 30 08:28:56 pop-os gsd-media-keys[2738]: Unable to get default sink
Dec 30 08:28:56 pop-os kernel: snd_hda_codec_hdmi hdaudioC0D0: Unable to sync register 0x2f0d00. -5

Full log

Dark-Matter7232 commented 2 years ago

Faced another system freeze, audio playback was still working and cursor was movable.

Trimmed log:

Jan 03 11:21:04 pop-os ModemManager[982]: <info>  [base-manager] couldn't check support for device '/sys/devices/pci0000:00/0000:00:01.2/0000:02:00.0/usb1/1-2': not supported by any plugin
Jan 03 11:21:05 pop-os gvfsd[133497]: Error 1: Get Storage information failed.
Jan 03 11:21:05 pop-os dbus-daemon[1100]: [session uid=1000 pid=1100] Activating service name='org.gnome.Shell.HotplugSniffer' requested by ':1.33' (uid=1000 pid=1280 comm="/usr/bin/gnome-shell ")
Jan 03 11:21:05 pop-os dbus-daemon[1100]: [session uid=1000 pid=1100] Successfully activated service 'org.gnome.Shell.HotplugSniffer'
Jan 03 11:21:17 pop-os gnome-shell[1280]: JS ERROR: Error: Expected an object of type ClutterActor for argument 'sibling' but got type undefined
                                          _syncStacking@resource:///org/gnome/shell/ui/workspaceAnimation.js:80:18
Jan 03 11:21:17 pop-os gnome-shell[1280]: JS ERROR: Error: Expected an object of type ClutterActor for argument 'sibling' but got type undefined
                                          _syncStacking@resource:///org/gnome/shell/ui/workspaceAnimation.js:80:18
Jan 03 11:21:20 pop-os gnome-shell[7093]: [7094:7094:0103/112120.755463:ERROR:brave_new_tab_message_handler.cc(195)] Ads service is not initialized!
Jan 03 11:21:20 pop-os gnome-shell[1280]: Can't update stage views actor MetaWindowGroup is on because it needs an allocation.
Jan 03 11:21:20 pop-os gnome-shell[1280]: Can't update stage views actor MetaWindowActorX11 is on because it needs an allocation.
Jan 03 11:21:20 pop-os gnome-shell[1280]: Can't update stage views actor MetaSurfaceActorX11 is on because it needs an allocation.
Jan 03 11:21:20 pop-os gnome-shell[1280]: Can't update stage views actor MetaWindowActorX11 is on because it needs an allocation.
Jan 03 11:21:20 pop-os gnome-shell[1280]: Can't update stage views actor MetaSurfaceActorX11 is on because it needs an allocation.
Jan 03 11:21:20 pop-os gnome-shell[7093]: [7094:7094:0103/112120.965013:ERROR:CONSOLE(0)] "Unchecked runtime.lastError: Not available in Tor/incognito/guest profile", source: chrome://newtab/ (0)
Jan 03 11:21:20 pop-os gnome-shell[7093]: [7094:7094:0103/112120.965965:ERROR:CONSOLE(0)] "Unchecked runtime.lastError: Not available in Tor/incognito/guest profile", source: chrome://newtab/ (0)
Jan 03 11:21:56 pop-os kernel: BUG: unable to handle page fault for address: fffffffffffffff8
Jan 03 11:21:56 pop-os kernel: #PF: supervisor read access in kernel mode
Jan 03 11:21:56 pop-os kernel: #PF: error_code(0x0000) - not-present page
Jan 03 11:21:56 pop-os kernel: PGD 1ce415067 P4D 1ce415067 PUD 1ce417067 PMD 0 
Jan 03 11:21:56 pop-os kernel: Oops: 0000 [#1] SMP NOPTI
Jan 03 11:21:56 pop-os kernel: CPU: 1 PID: 250 Comm: uvd Tainted: G         C OE     5.15.12-xanmod1 #0~git20211229.8293471
Jan 03 11:21:56 pop-os kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./A320M-HDV R4.0, BIOS P2.30 06/26/2019
Jan 03 11:21:56 pop-os kernel: RIP: 0010:swake_up_locked+0x12/0x40
Jan 03 11:21:56 pop-os kernel: Code: 10 48 89 02 eb 83 f6 80 f9 07 00 00 01 0f 84 5a ff ff ff eb ad 0f 1f 00 48 8b 57 08 48 8d 47 08 48 39 c2 74 25 53 48 8b 5f 08 <48> 8b 7b f8 e8 05 3e fe ff 48 8b 13 48 8b 43 08 48 89 42 08 48 89
Jan 03 11:21:56 pop-os kernel: RSP: 0018:ffffae3a8111fe80 EFLAGS: 00010007
Jan 03 11:21:56 pop-os kernel: RAX: ffff948bad7018b0 RBX: 0000000000000000 RCX: 00000001004fea90
Jan 03 11:21:56 pop-os kernel: RDX: 0000000000000000 RSI: ffff9489f25eaa30 RDI: ffff948bad7018a8
Jan 03 11:21:56 pop-os kernel: RBP: ffff948bad7018a8 R08: 0000000000000001 R09: 0000000000000052
Jan 03 11:21:56 pop-os kernel: R10: ffff948ad6d97000 R11: ffff948ad6d97000 R12: 0000000000000286
Jan 03 11:21:56 pop-os kernel: R13: ffff948ad7c4ecc8 R14: ffff948bad7018a0 R15: ffff948acf3f4e40
Jan 03 11:21:56 pop-os kernel: FS:  0000000000000000(0000) GS:ffff948dcea40000(0000) knlGS:0000000000000000
Jan 03 11:21:56 pop-os kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 03 11:21:56 pop-os kernel: CR2: fffffffffffffff8 CR3: 000000010e23c000 CR4: 00000000003506e0
Jan 03 11:21:56 pop-os kernel: Call Trace:
Jan 03 11:21:56 pop-os kernel:  <TASK>
Jan 03 11:21:56 pop-os kernel:  complete+0x2a/0x40
Jan 03 11:21:56 pop-os kernel:  drm_sched_main+0x1ab/0x3d0 [gpu_sched]
Jan 03 11:21:56 pop-os kernel:  ? __wake_up_pollfree+0x30/0x30
Jan 03 11:21:56 pop-os kernel:  ? drm_sched_select_entity+0xc0/0xc0 [gpu_sched]
Jan 03 11:21:56 pop-os kernel:  kthread+0x11f/0x140
Jan 03 11:21:56 pop-os kernel:  ? set_kthread_struct+0x30/0x30
Jan 03 11:21:56 pop-os kernel:  ret_from_fork+0x1f/0x30
Jan 03 11:21:56 pop-os kernel:  </TASK>
Jan 03 11:21:56 pop-os kernel: Modules linked in: ses enclosure scsi_transport_sas uas usb_storage cdc_acm ntfs3 cfg80211 ntfs snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi ledtrig_audio nls_iso8859_1 snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi intel_rapl_msr snd_hda_codec intel_rapl_common snd_hda_core snd_hwdep snd_pcm edac_mce_amd snd_seq_midi snd_seq_midi_event snd_rawmidi kvm_amd kvm r8188eu(C) snd_seq rapl joydev input_leds snd_seq_device snd_timer efi_pstore wmi_bmof snd ccp k10temp soundcore mac_hid sch_fq_codel msr parport_pc ppdev lp parport binfmt_misc ip_tables x_tables autofs4 dm_crypt raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear system76_io(OE) system76_acpi(OE) hid_generic usbhid hid amdgpu r8169 realtek mdio_devres crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd iommu_v2 gpu_sched i2c_algo_bit drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect
Jan 03 11:21:56 pop-os kernel:  sysimgblt fb_sys_fops cec rc_core libphy ahci xhci_pci drm libahci xhci_pci_renesas wmi i2c_piix4 video gpio_amdpt gpio_generic
Jan 03 11:21:56 pop-os kernel: CR2: fffffffffffffff8
Jan 03 11:21:56 pop-os kernel: ---[ end trace a8b2cf5824589357 ]---
Jan 03 11:21:56 pop-os kernel: [drm] Fence fallback timer expired on ring gfx
Jan 03 11:21:56 pop-os kernel: RIP: 0010:swake_up_locked+0x12/0x40
Jan 03 11:21:56 pop-os kernel: Code: 10 48 89 02 eb 83 f6 80 f9 07 00 00 01 0f 84 5a ff ff ff eb ad 0f 1f 00 48 8b 57 08 48 8d 47 08 48 39 c2 74 25 53 48 8b 5f 08 <48> 8b 7b f8 e8 05 3e fe ff 48 8b 13 48 8b 43 08 48 89 42 08 48 89
Jan 03 11:21:56 pop-os kernel: RSP: 0018:ffffae3a8111fe80 EFLAGS: 00010007
Jan 03 11:21:56 pop-os kernel: RAX: ffff948bad7018b0 RBX: 0000000000000000 RCX: 00000001004fea90
Jan 03 11:21:56 pop-os kernel: RDX: 0000000000000000 RSI: ffff9489f25eaa30 RDI: ffff948bad7018a8
Jan 03 11:21:56 pop-os kernel: RBP: ffff948bad7018a8 R08: 0000000000000001 R09: 0000000000000052
Jan 03 11:21:56 pop-os kernel: R10: ffff948ad6d97000 R11: ffff948ad6d97000 R12: 0000000000000286
Jan 03 11:21:56 pop-os kernel: R13: ffff948ad7c4ecc8 R14: ffff948bad7018a0 R15: ffff948acf3f4e40
Jan 03 11:21:56 pop-os kernel: FS:  0000000000000000(0000) GS:ffff948dcea40000(0000) knlGS:0000000000000000
Jan 03 11:21:56 pop-os kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 03 11:21:56 pop-os kernel: CR2: fffffffffffffff8 CR3: 000000010e23c000 CR4: 00000000003506e0
Jan 03 11:21:56 pop-os kernel: sched: RT throttling activated
Jan 03 11:21:56 pop-os kernel: BUG: kernel NULL pointer dereference, address: 0000000000000259
Jan 03 11:21:56 pop-os kernel: #PF: supervisor read access in kernel mode
Jan 03 11:21:56 pop-os kernel: #PF: error_code(0x0000) - not-present page
Jan 03 11:21:56 pop-os kernel: PGD 0 P4D 0 
Jan 03 11:21:56 pop-os kernel: Oops: 0000 [#2] SMP NOPTI
Jan 03 11:21:56 pop-os kernel: CPU: 1 PID: 80095 Comm: kworker/u64:1 Tainted: G      D  C OE     5.15.12-xanmod1 #0~git20211229.8293471
Jan 03 11:21:56 pop-os kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./A320M-HDV R4.0, BIOS P2.30 06/26/2019
Jan 03 11:21:56 pop-os kernel: Workqueue: events_unbound commit_work [drm_kms_helper]
Jan 03 11:21:56 pop-os kernel: RIP: 0010:drm_atomic_helper_cleanup_planes+0x2c/0x60 [drm_kms_helper]
Jan 03 11:21:56 pop-os kernel: Code: 56 08 8b 82 b8 02 00 00 85 c0 7e 51 55 48 89 f5 53 31 db 48 63 c3 48 c1 e0 05 48 03 45 18 48 8b 38 48 85 ff 74 29 48 8b 70 10 <48> 39 b7 58 02 00 00 48 0f 44 70 18 48 8b 87 50 02 00 00 48 8b 40
Jan 03 11:21:56 pop-os kernel: RSP: 0018:ffffae3a8e7bfb38 EFLAGS: 00010202
Jan 03 11:21:56 pop-os kernel: RAX: ffff948bad7018a0 RBX: 0000000000000005 RCX: ffff948ad7c45b01
Jan 03 11:21:56 pop-os kernel: RDX: ffff948ad7c40010 RSI: 0000000000000000 RDI: 0000000000000001
Jan 03 11:21:56 pop-os kernel: RBP: ffff948acc6df500 R08: ffff948accf8fdb8 R09: 0000000000000001
Jan 03 11:21:56 pop-os kernel: R10: 000000000000000c R11: 000000000000022c R12: 0000000000000246
Jan 03 11:21:56 pop-os kernel: R13: ffff948ad7c40170 R14: ffff948ad7c40010 R15: ffff948acc6df500
Jan 03 11:21:56 pop-os kernel: FS:  0000000000000000(0000) GS:ffff948dcea40000(0000) knlGS:0000000000000000
Jan 03 11:21:56 pop-os kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 03 11:21:56 pop-os kernel: CR2: 0000000000000259 CR3: 000000010d05a000 CR4: 00000000003506e0
Jan 03 11:21:56 pop-os kernel: Call Trace:
Jan 03 11:21:56 pop-os kernel:  <TASK>
Jan 03 11:21:56 pop-os kernel:  amdgpu_dm_atomic_commit_tail+0x19f8/0x25c0 [amdgpu]
Jan 03 11:21:56 pop-os kernel:  ? try_to_wake_up+0x1a7/0x430
Jan 03 11:21:56 pop-os kernel:  ? __ext4_handle_dirty_metadata+0x58/0x1a0
Jan 03 11:21:56 pop-os kernel:  ? lock_timer_base+0x5c/0x80
Jan 03 11:21:56 pop-os kernel:  ? __mod_timer+0x20f/0x3b0
Jan 03 11:21:56 pop-os kernel:  ? update_load_avg+0x7a/0x530
Jan 03 11:21:56 pop-os kernel:  ? newidle_balance+0x11b/0x3f0
Jan 03 11:21:56 pop-os kernel:  ? __cond_resched+0x11/0x40
Jan 03 11:21:56 pop-os kernel:  ? __wait_for_common+0x3b/0x160
Jan 03 11:21:56 pop-os kernel:  ? finish_task_switch.isra.0+0xa2/0x280
Jan 03 11:21:56 pop-os kernel:  commit_tail+0x8c/0x120 [drm_kms_helper]
Jan 03 11:21:56 pop-os kernel:  process_one_work+0x1f7/0x360
Jan 03 11:21:56 pop-os kernel:  worker_thread+0x4b/0x400
Jan 03 11:21:56 pop-os kernel:  ? process_one_work+0x360/0x360
Jan 03 11:21:56 pop-os kernel:  kthread+0x11f/0x140
Jan 03 11:21:56 pop-os kernel:  ? set_kthread_struct+0x30/0x30
Jan 03 11:21:56 pop-os kernel:  ret_from_fork+0x1f/0x30
Jan 03 11:21:56 pop-os kernel:  </TASK>
Jan 03 11:21:56 pop-os kernel: Modules linked in: ses enclosure scsi_transport_sas uas usb_storage cdc_acm ntfs3 cfg80211 ntfs snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi ledtrig_audio nls_iso8859_1 snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi intel_rapl_msr snd_hda_codec intel_rapl_common snd_hda_core snd_hwdep snd_pcm edac_mce_amd snd_seq_midi snd_seq_midi_event snd_rawmidi kvm_amd kvm r8188eu(C) snd_seq rapl joydev input_leds snd_seq_device snd_timer efi_pstore wmi_bmof snd ccp k10temp soundcore mac_hid sch_fq_codel msr parport_pc ppdev lp parport binfmt_misc ip_tables x_tables autofs4 dm_crypt raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear system76_io(OE) system76_acpi(OE) hid_generic usbhid hid amdgpu r8169 realtek mdio_devres crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd iommu_v2 gpu_sched i2c_algo_bit drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect
Jan 03 11:21:56 pop-os kernel:  sysimgblt fb_sys_fops cec rc_core libphy ahci xhci_pci drm libahci xhci_pci_renesas wmi i2c_piix4 video gpio_amdpt gpio_generic
Jan 03 11:21:56 pop-os kernel: CR2: 0000000000000259
Jan 03 11:21:56 pop-os kernel: ---[ end trace a8b2cf5824589358 ]---
Jan 03 11:21:56 pop-os kernel: RIP: 0010:swake_up_locked+0x12/0x40
Jan 03 11:21:56 pop-os kernel: Code: 10 48 89 02 eb 83 f6 80 f9 07 00 00 01 0f 84 5a ff ff ff eb ad 0f 1f 00 48 8b 57 08 48 8d 47 08 48 39 c2 74 25 53 48 8b 5f 08 <48> 8b 7b f8 e8 05 3e fe ff 48 8b 13 48 8b 43 08 48 89 42 08 48 89
Jan 03 11:21:56 pop-os kernel: RSP: 0018:ffffae3a8111fe80 EFLAGS: 00010007
Jan 03 11:21:56 pop-os kernel: RAX: ffff948bad7018b0 RBX: 0000000000000000 RCX: 00000001004fea90
Jan 03 11:21:56 pop-os kernel: RDX: 0000000000000000 RSI: ffff9489f25eaa30 RDI: ffff948bad7018a8
Jan 03 11:21:56 pop-os kernel: RBP: ffff948bad7018a8 R08: 0000000000000001 R09: 0000000000000052
Jan 03 11:21:56 pop-os kernel: R10: ffff948ad6d97000 R11: ffff948ad6d97000 R12: 0000000000000286
Jan 03 11:21:56 pop-os kernel: R13: ffff948ad7c4ecc8 R14: ffff948bad7018a0 R15: ffff948acf3f4e40
Jan 03 11:21:56 pop-os kernel: FS:  0000000000000000(0000) GS:ffff948dcea40000(0000) knlGS:0000000000000000
Jan 03 11:21:56 pop-os kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 03 11:21:56 pop-os kernel: CR2: 0000000000000259 CR3: 000000010d05a000 CR4: 00000000003506e0
Jan 03 11:21:57 pop-os spotify.desktop[98989]: [+] cef_urlrequest_create:         https://spclient.wg.spotify.com/connect-state/v1/devices/7b26dfe4a089446f295af30a40e760de4d531544
Jan 03 11:21:57 pop-os spotify.desktop[98989]: [+] cef_urlrequest_create:         https://spclient.wg.spotify.com/connect-state/v1/devices/7b26dfe4a089446f295af30a40e760de4d531544
Jan 03 11:22:04 pop-os kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
Jan 03 11:22:04 pop-os kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring uvd timeout, signaled seq=216, emitted seq=216
Jan 03 11:22:04 pop-os kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process brave pid 7135 thread brave:cs0 pid 7140
Jan 03 11:22:04 pop-os kernel: amdgpu 0000:01:00.0: amdgpu: GPU reset begin!
Jan 03 11:22:04 pop-os kernel: ------------[ cut here ]------------
Jan 03 11:22:04 pop-os kernel: WARNING: CPU: 1 PID: 100306 at kthread_park+0x68/0x80
Jan 03 11:22:04 pop-os kernel: Modules linked in: ses enclosure scsi_transport_sas uas usb_storage cdc_acm ntfs3 cfg80211 ntfs snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi ledtrig_audio nls_iso8859_1 snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi intel_rapl_msr snd_hda_codec intel_rapl_common snd_hda_core snd_hwdep snd_pcm edac_mce_amd snd_seq_midi snd_seq_midi_event snd_rawmidi kvm_amd kvm r8188eu(C) snd_seq rapl joydev input_leds snd_seq_device snd_timer efi_pstore wmi_bmof snd ccp k10temp soundcore mac_hid sch_fq_codel msr parport_pc ppdev lp parport binfmt_misc ip_tables x_tables autofs4 dm_crypt raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear system76_io(OE) system76_acpi(OE) hid_generic usbhid hid amdgpu r8169 realtek mdio_devres crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd iommu_v2 gpu_sched i2c_algo_bit drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect
Jan 03 11:22:04 pop-os kernel:  sysimgblt fb_sys_fops cec rc_core libphy ahci xhci_pci drm libahci xhci_pci_renesas wmi i2c_piix4 video gpio_amdpt gpio_generic
Jan 03 11:22:04 pop-os kernel: CPU: 1 PID: 100306 Comm: kworker/1:1 Tainted: G      D  C OE     5.15.12-xanmod1 #0~git20211229.8293471
Jan 03 11:22:04 pop-os kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./A320M-HDV R4.0, BIOS P2.30 06/26/2019
Jan 03 11:22:04 pop-os kernel: Workqueue: events drm_sched_job_timedout [gpu_sched]
Jan 03 11:22:04 pop-os kernel: RIP: 0010:kthread_park+0x68/0x80
Jan 03 11:22:04 pop-os kernel: Code: 20 e8 1c 00 ab 00 be 40 00 00 00 48 89 ef e8 df 06 01 00 48 85 c0 74 25 31 c0 5b 5d c3 0f 0b 48 8b 9d 60 06 00 00 a8 04 74 b2 <0f> 0b b8 da ff ff ff 5b 5d c3 0f 0b b8 f0 ff ff ff eb dd 0f 0b eb
Jan 03 11:22:04 pop-os kernel: RSP: 0018:ffffae3a881cfce8 EFLAGS: 00010202
Jan 03 11:22:04 pop-os kernel: RAX: 0000000000208044 RBX: ffff948ad7204900 RCX: 0000000000000000
Jan 03 11:22:04 pop-os kernel: RDX: 0000000000000000 RSI: ffff9489f25ea800 RDI: ffff948ad7c78000
Jan 03 11:22:04 pop-os kernel: RBP: ffff948ad7c78000 R08: 0000000000000000 R09: 0000000000000001
Jan 03 11:22:04 pop-os kernel: R10: 0000000000000001 R11: 00000000000002e9 R12: ffff948ad7c4eb50
Jan 03 11:22:04 pop-os kernel: R13: 0000000000000000 R14: ffff948ad7c4ecb8 R15: 0000000000000060
Jan 03 11:22:04 pop-os kernel: FS:  0000000000000000(0000) GS:ffff948dcea40000(0000) knlGS:0000000000000000
Jan 03 11:22:04 pop-os kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 03 11:22:04 pop-os kernel: CR2: 000021e80055f000 CR3: 0000000115d4a000 CR4: 00000000003506e0
Jan 03 11:22:04 pop-os kernel: Call Trace:
Jan 03 11:22:04 pop-os kernel:  <TASK>
Jan 03 11:22:04 pop-os kernel:  drm_sched_stop+0x2d/0x160 [gpu_sched]
Jan 03 11:22:04 pop-os kernel:  ? down+0x15/0x50
Jan 03 11:22:04 pop-os kernel:  amdgpu_device_gpu_recover.cold+0xa18/0xa50 [amdgpu]
Jan 03 11:22:04 pop-os kernel:  amdgpu_job_timedout+0x14a/0x170 [amdgpu]
Jan 03 11:22:04 pop-os kernel:  drm_sched_job_timedout+0x60/0xf0 [gpu_sched]
Jan 03 11:22:04 pop-os kernel:  process_one_work+0x1f7/0x360
Jan 03 11:22:04 pop-os kernel:  worker_thread+0x4b/0x400
Jan 03 11:22:04 pop-os kernel:  ? process_one_work+0x360/0x360
Jan 03 11:22:04 pop-os kernel:  kthread+0x11f/0x140
Jan 03 11:22:04 pop-os kernel:  ? set_kthread_struct+0x30/0x30
Jan 03 11:22:04 pop-os kernel:  ret_from_fork+0x1f/0x30
Jan 03 11:22:04 pop-os kernel:  </TASK>
Jan 03 11:22:04 pop-os kernel: ---[ end trace a8b2cf5824589359 ]---
Jan 03 11:22:16 pop-os gnome-shell[7093]: [0103/112216.050373:WARNING:exception_snapshot_linux.cc(427)] Unhandled signal -1
Jan 03 11:22:53 pop-os spotify.desktop[98989]: [+] cef_urlrequest_create:         https://i.scdn.co/image/ab67616d00001e02bca27e89b082e7fa21a6b1e9
Jan 03 11:22:53 pop-os spotify.desktop[98989]: [+] cef_urlrequest_create:         https://spclient.wg.spotify.com/extended-metadata/v0/extended-metadata
Jan 03 11:22:53 pop-os spotify.desktop[98989]: [+] cef_urlrequest_create:         https://spclient.wg.spotify.com/connect-state/v1/devices/7b26dfe4a089446f295af30a40e760de4d531544
Jan 03 11:22:53 pop-os spotify.desktop[98989]: [+] cef_urlrequest_create:         https://spclient.wg.spotify.com/net-fortune/v2/fortune
Jan 03 11:22:58 pop-os spotify.desktop[98989]: [+] cef_urlrequest_create:         https://spclient.wg.spotify.com/storage-resolve/v2/files/audio/interactive_prefetch/1/e4e56fdfed08e82a66c205ca3f056975bb16bad3?product=0
Jan 03 11:22:58 pop-os spotify.desktop[98989]: [+] cef_urlrequest_create:         https://spclient.wg.spotify.com/playplay/v1/key/58de44c133b247800e0f7b15c408a05e8a5a5f35
Jan 03 11:23:18 pop-os /usr/libexec/gdm-x-session[1069]: (EE) event6  - Gaming Keyboard: client bug: event processing lagging behind by 25ms, your system is too slow
Jan 03 11:23:18 pop-os /usr/libexec/gdm-x-session[1069]: (EE) event6  - Gaming Keyboard: WARNING: log rate limit exceeded (5 msgs per 60min). Discarding future messages.
Jan 03 11:23:56 pop-os kernel: usb 1-2: USB disconnect, device number 13
Jan 03 11:23:56 pop-os gvfsd[133497]: PTP: reading event an error 0x05 occurred
Jan 03 11:23:57 pop-os gvfsd[133497]: Device 0 (VID=04e8 and PID=6860) is a Samsung Galaxy models (MTP).
Jan 03 11:23:57 pop-os gvfsd[133497]: Android device detected, assigning default bug flags
Jan 03 11:23:57 pop-os gvfsd[133497]: Received event PTP_EC_DevicePropChanged in session 1
Jan 03 11:23:57 pop-os gvfsd[133497]: Received event PTP_EC_StoreAdded in session 1
Jan 03 11:23:57 pop-os gvfsd[133497]: Received event PTP_EC_StoreAdded in session 1
Jan 03 11:23:57 pop-os gvfsd[133497]: Received event PTP_EC_StoreAdded in session 1
Jan 03 11:23:57 pop-os gvfsd[133497]: Received event PTP_EC_DevicePropChanged in session 1

Full log

cjfgraff commented 2 years ago

Finally found an issue similar to the one I'm running into. I'll add some info if it helps speed things along with a resolution.

I've run into this problem many times and it's always been with my Radeon RX 6700XT. It appears in Xubuntu as well, and I switched to Pop! OS in the hopes that the drivers and firmware should already be present (didn't have to install anything). Crashes ended up occurring frequently enough that I had to switch to an older NVIDIA GT 1030 card that I have no issues running. The 6700XT isn't a dud either. It works fine in Win 10, with the same hardware setup, running reasonably graphics intensive games.

To summarize, the symptoms are the same to what @Dark-Matter7232 has said:

Reproduction and Mitigation It's certainly an intermittent issue, and I can't yet figure out how to exacerbate it to occur consistently.

I've tried mitigating as best I could with the following actions to no avail:

I've not yet tried the following:

Pop! OS Distribution cat /etc/os-release

NAME="Pop!_OS"
VERSION="21.10"
ID=pop
ID_LIKE="ubuntu debian"
PRETTY_NAME="Pop!_OS 21.10"
VERSION_ID="21.10"
HOME_URL="https://pop.system76.com"
SUPPORT_URL="https://support.system76.com"
BUG_REPORT_URL="https://github.com/pop-os/pop/issues"
PRIVACY_POLICY_URL="https://system76.com/privacy"
VERSION_CODENAME=impish
UBUNTU_CODENAME=impish
LOGO=distributor-logo-pop-os

Hardware Info inxi -Fxxxrz

System:
  Kernel: 5.15.15-76051515-generic x86_64 bits: 64 compiler: gcc v: 11.2.0 
  Desktop: GNOME 40.5 tk: GTK 3.24.30 wm: gnome-shell dm: GDM3 41.rc 
  Distro: Pop!_OS 21.10 base: Ubuntu Impish 
Machine:
  Type: Desktop System: ASUS product: N/A v: N/A serial: <filter> 
  Mobo: ASUSTeK model: ROG STRIX Z490-E GAMING v: Rev 1.xx serial: <filter> 
  UEFI: American Megatrends v: 2004 date: 01/13/2021 
CPU:
  Info: 8-Core model: Intel Core i7-10700F bits: 64 type: MT MCP 
  arch: Comet Lake rev: 5 cache: L2: 16 MiB 
  flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 
  bogomips: 92796 
  Speed: 800 MHz min/max: 800/4800 MHz Core speeds (MHz): 1: 800 2: 4417 
  3: 867 4: 3222 5: 1897 6: 1114 7: 800 8: 800 9: 800 10: 800 11: 800 
  12: 800 13: 800 14: 800 15: 800 16: 800 
Graphics:
  Device-1: AMD Navi 22 [Radeon RX 6700/6700 XT / 6800M] vendor: ASUSTeK 
  driver: amdgpu v: kernel bus-ID: 03:00.0 chip-ID: 1002:73df class-ID: 0300 
  Display: x11 server: X.Org 1.20.13 compositor: gnome-shell driver: 
  loaded: amdgpu,ati unloaded: fbdev,modesetting,radeon,vesa 
  resolution: 1680x1050~60Hz s-dpi: 96 
  OpenGL: renderer: AMD Radeon RX 6700 XT (NAVY_FLOUNDER DRM 3.42.0 
  5.15.15-76051515-generic LLVM 12.0.1) 
  v: 4.6 Mesa 21.2.2 direct render: Yes 
Audio:
  Device-1: Intel Comet Lake PCH cAVS vendor: ASUSTeK driver: snd_hda_intel 
  v: kernel bus-ID: 00:1f.3 chip-ID: 8086:06c8 class-ID: 0403 
  Device-2: AMD Navi 21 HDMI Audio [Radeon RX 6800/6800 XT / 6900 XT] 
  driver: snd_hda_intel v: kernel bus-ID: 03:00.1 chip-ID: 1002:ab28 
  class-ID: 0403 
  Sound Server-1: ALSA v: k5.15.15-76051515-generic running: yes 
  Sound Server-2: PulseAudio v: 15.0 running: yes 
  Sound Server-3: PipeWire v: 0.3.32 running: yes 
Network:
  Device-1: Intel Comet Lake PCH CNVi WiFi driver: iwlwifi v: kernel 
  bus-ID: 00:14.3 chip-ID: 8086:06f0 class-ID: 0280 
  IF: wlo1 state: down mac: <filter> 
  Device-2: Intel Ethernet I225-V vendor: ASUSTeK driver: igc v: kernel 
  port: 3000 bus-ID: 06:00.0 chip-ID: 8086:15f3 class-ID: 0200 
  IF: enp6s0 state: up speed: 1000 Mbps duplex: full mac: <filter> 
Bluetooth:
  Device-1: Intel type: USB driver: btusb v: 0.8 bus-ID: 1-14:7 
  chip-ID: 8087:0026 class-ID: e001 
  Report: hciconfig ID: hci0 rfk-id: 0 state: up address: <filter> bt-v: 3.0 
  lmp-v: 5.2 sub-v: 27a4 hci-v: 5.2 rev: 27a4 
Drives:
  Local Storage: total: 2.73 TiB used: 490.47 GiB (17.6%) 
  ID-1: /dev/nvme0n1 vendor: Samsung model: SSD 970 EVO Plus 2TB 
  size: 1.82 TiB speed: 31.6 Gb/s lanes: 4 type: SSD serial: <filter> 
  rev: 2B2QEXM7 temp: 35.9 C scheme: GPT 
  ID-2: /dev/nvme1n1 vendor: Samsung model: SSD 970 EVO Plus 1TB 
  size: 931.51 GiB speed: 31.6 Gb/s lanes: 4 type: SSD serial: <filter> 
  rev: 2B2QEXM7 temp: 38.9 C scheme: GPT 
Partition:
  ID-1: / size: 63.67 GiB used: 14.33 GiB (22.5%) fs: ext4 
  dev: /dev/nvme1n1p6 
  ID-2: /boot/efi size: 487 MiB used: 184.7 MiB (37.9%) fs: vfat 
  dev: /dev/nvme1n1p5 
  ID-3: /home size: 562.29 GiB used: 39.66 GiB (7.1%) fs: ext4 
  dev: /dev/nvme1n1p7 
Swap:
  Alert: No swap data was found. 
Sensors:
  System Temperatures: cpu: 27.8 C mobo: N/A gpu: amdgpu temp: 34.0 C 
  mem: 30.0 C 
  Fan Speeds (RPM): N/A gpu: amdgpu fan: 0 
Repos:
  Packages: 1977 apt: 1934 flatpak: 40 snap: 3 
  No active apt repos in: /etc/apt/sources.list 
  Active apt repos in: /etc/apt/sources.list.d/pop-os-apps.sources 
  1: deb http://apt.pop-os.org/proprietary impish main
  Active apt repos in: /etc/apt/sources.list.d/pop-os-ppa.sources 
  1: deb deb-src http://apt.pop-os.org/release impish main
  Active apt repos in: /etc/apt/sources.list.d/system.sources 
  1: deb deb-src http://us.archive.ubuntu.com/ubuntu/ impish impish-security impish-updates impish-backports main restricted universe multiverse
  2: deb deb-src X-Repolib-Default-Mirror: http://us.archive.ubuntu.com/ubuntu/ impish impish-security impish-updates impish-backports main restricted universe multiverse
Info:
  Processes: 427 Uptime: 33m wakeups: 0 Memory: 31.25 GiB 
  used: 4.5 GiB (14.4%) Init: systemd v: 248 runlevel: 5 Compilers: 
  gcc: 11.2.0 alt: 10/11 Shell: Bash v: 5.1.8 running-in: gnome-terminal 
  inxi: 3.3.06

Mesa Info

ii  libegl-mesa0:amd64                      21.2.2-1ubuntu1pop0~1634226723~21.10~b715ae2           amd64        free implementation of the EGL API -- Mesa vendor library
ii  libgl1-mesa-dri:amd64                   21.2.2-1ubuntu1pop0~1634226723~21.10~b715ae2           amd64        free implementation of the OpenGL API -- DRI modules
ii  libglapi-mesa:amd64                     21.2.2-1ubuntu1pop0~1634226723~21.10~b715ae2           amd64        free implementation of the GL API -- shared library
ii  libglu1-mesa:amd64                      9.0.1-1build1                                          amd64        Mesa OpenGL utility library (GLU)
ii  libglx-mesa0:amd64                      21.2.2-1ubuntu1pop0~1634226723~21.10~b715ae2           amd64        free implementation of the OpenGL API -- GLX vendor library
ii  mesa-utils                              8.4.0-1build1                                          amd64        Miscellaneous Mesa GL utilities
ii  mesa-va-drivers:amd64                   21.2.2-1ubuntu1pop0~1634226723~21.10~b715ae2           amd64        Mesa VA-API video acceleration drivers
ii  mesa-vdpau-drivers:amd64                21.2.2-1ubuntu1pop0~1634226723~21.10~b715ae2           amd64        Mesa VDPAU video acceleration drivers
ii  mesa-vulkan-drivers:amd64               21.2.2-1ubuntu1pop0~1634226723~21.10~b715ae2           amd64        Mesa Vulkan graphics drivers

AMDGPU Firmware Info sudo cp /sys/kernel/debug/dri/0/amdgpu_firmware_info

VCE feature version: 0, firmware version: 0x00000000
UVD feature version: 0, firmware version: 0x00000000
MC feature version: 0, firmware version: 0x00000000
ME feature version: 38, firmware version: 0x0000003e
PFP feature version: 38, firmware version: 0x00000056
CE feature version: 38, firmware version: 0x00000024
RLC feature version: 1, firmware version: 0x00000042
RLC SRLC feature version: 0, firmware version: 0x00000000
RLC SRLG feature version: 0, firmware version: 0x00000000
RLC SRLS feature version: 0, firmware version: 0x00000000
MEC feature version: 38, firmware version: 0x00000058
MEC2 feature version: 38, firmware version: 0x00000058
SOS feature version: 0, firmware version: 0x0022020a
ASD feature version: 553648218, firmware version: 0x2100005a
TA XGMI feature version: 0x00000000, firmware version: 0x00000000
TA RAS feature version: 0x00000000, firmware version: 0x00000000
TA HDCP feature version: 0x1700001f, firmware version: 0x00000000
TA DTM feature version: 0x12000009, firmware version: 0x00000000
TA RAP feature version: 0x0700000e, firmware version: 0x00000000
TA SECUREDISPLAY feature version: 0x00000000, firmware version: 0x00000000
SMC feature version: 0, firmware version: 0x00412a00
SDMA0 feature version: 52, firmware version: 0x00000045
SDMA1 feature version: 52, firmware version: 0x00000045
VCN feature version: 0, firmware version: 0x02110001
DMCU feature version: 0, firmware version: 0x00000000
DMCUB feature version: 0, firmware version: 0x02020003
TOC feature version: 0, firmware version: 0x00000000
VBIOS version: 115-D512BS0-100

Last Failure (2/2/2022, 08:03:20) Trimmed from /var/log/syslog

Feb  2 08:03:20 kernel: [37932.038809] [drm:amdgpu_dm_commit_planes [amdgpu]] *ERROR* Waiting for fences timed out!
Feb  2 08:03:20 kernel: [37937.168911] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=228661, emitted seq=228662
Feb  2 08:03:20 kernel: [37937.169061] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 2184 thread Xorg:cs0 pid 2284
Feb  2 08:03:20 kernel: [37937.169164] amdgpu 0000:03:00.0: amdgpu: GPU reset begin!
Feb  2 08:03:20 kernel: [37937.396684] amdgpu 0000:03:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
Feb  2 08:03:20 kernel: [37937.396771] [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KGQ disable failed
Feb  2 08:03:20 kernel: [37937.591477] amdgpu 0000:03:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
Feb  2 08:03:20 kernel: [37937.591549] [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed
Feb  2 08:03:20 kernel: [37937.786506] [drm:gfx_v10_0_cp_gfx_enable.isra.0 [amdgpu]] *ERROR* failed to halt cp gfx
Feb  2 08:03:20 kernel: [37937.800525] [drm] free PSP TMR buffer
Feb  2 08:03:20 kernel: [37937.845178] amdgpu 0000:03:00.0: amdgpu: MODE1 reset
Feb  2 08:03:20 kernel: [37937.845180] amdgpu 0000:03:00.0: amdgpu: GPU mode1 reset
Feb  2 08:03:20 kernel: [37937.845228] amdgpu 0000:03:00.0: amdgpu: GPU smu mode1 reset
Feb  2 08:03:21 kernel: [37938.375075] amdgpu 0000:03:00.0: amdgpu: GPU reset succeeded, trying to resume
Feb  2 08:03:21 kernel: [37938.375271] [drm] PCIE GART of 512M enabled (table at 0x0000008000300000).
Feb  2 08:03:21 kernel: [37938.375308] [drm] VRAM is lost due to GPU reset!
Feb  2 08:03:21 kernel: [37938.376232] [drm] PSP is resuming...
Feb  2 08:03:21 kernel: [37938.567818] [drm] reserve 0xa00000 from 0x82fe000000 for PSP TMR
Feb  2 08:03:21 kernel: [37938.646867] amdgpu 0000:03:00.0: amdgpu: RAS: optional ras ta ucode is not available
Feb  2 08:03:21 kernel: [37938.657253] amdgpu 0000:03:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
Feb  2 08:03:21 kernel: [37938.657254] amdgpu 0000:03:00.0: amdgpu: SMU is resuming...
Feb  2 08:03:21 kernel: [37938.712547] amdgpu 0000:03:00.0: amdgpu: SMU is resumed successfully!
Feb  2 08:03:21 kernel: [37938.713886] [drm] DMUB hardware initialized: version=0x02020003
Feb  2 08:03:22 kernel: [37939.025119] [drm] kiq ring mec 2 pipe 1 q 0
Feb  2 08:03:22 kernel: [37939.028387] [drm] VCN decode and encode initialized successfully(under DPG Mode).
Feb  2 08:03:22 kernel: [37939.028729] [drm] JPEG decode initialized successfully.
Feb  2 08:03:22 kernel: [37939.028754] amdgpu 0000:03:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
Feb  2 08:03:22 kernel: [37939.028756] amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
Feb  2 08:03:22 kernel: [37939.028756] amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
Feb  2 08:03:22 kernel: [37939.028757] amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
Feb  2 08:03:22 kernel: [37939.028758] amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
Feb  2 08:03:22 kernel: [37939.028758] amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
Feb  2 08:03:22 kernel: [37939.028759] amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
Feb  2 08:03:22 kernel: [37939.028759] amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
Feb  2 08:03:22 kernel: [37939.028760] amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
Feb  2 08:03:22 kernel: [37939.028760] amdgpu 0000:03:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
Feb  2 08:03:22 kernel: [37939.028761] amdgpu 0000:03:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
Feb  2 08:03:22 kernel: [37939.028762] amdgpu 0000:03:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0
Feb  2 08:03:22 kernel: [37939.028762] amdgpu 0000:03:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 1
Feb  2 08:03:22 kernel: [37939.028763] amdgpu 0000:03:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 1
Feb  2 08:03:22 kernel: [37939.028764] amdgpu 0000:03:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 1
Feb  2 08:03:22 kernel: [37939.028764] amdgpu 0000:03:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 1
Feb  2 08:03:22 kernel: [37939.033836] amdgpu 0000:03:00.0: amdgpu: recover vram bo from shadow start
Feb  2 08:03:22 kernel: [37939.033849] amdgpu 0000:03:00.0: amdgpu: recover vram bo from shadow done
Feb  2 08:03:22 kernel: [37939.033850] [drm] Skip scheduling IBs!
Feb  2 08:03:22 kernel: [37939.033884] amdgpu 0000:03:00.0: amdgpu: GPU reset(2) succeeded!
Feb  2 08:03:22 kernel: [37939.033893] [drm] Skip scheduling IBs!
Feb  2 08:03:22 gnome-shell[2521]: amdgpu: amdgpu_cs_query_fence_status failed.
Feb  2 08:03:22 kernel: [37939.033901] [drm] Skip scheduling IBs!
Feb  2 08:03:22 kernel: [37939.033924] [drm] Skip scheduling IBs!
Feb  2 08:03:22 kernel: [37939.034818] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Feb  2 08:03:22 /usr/libexec/gdm-x-session[2184]: amdgpu: The CS has been cancelled because the context is lost.
Feb  2 08:03:22 gnome-shell[2521]: amdgpu: The CS has been cancelled because the context is lost.
Feb  2 08:03:22 gnome-shell[2521]: amdgpu: amdgpu_cs_query_fence_status failed.
wertig0n commented 2 years ago

I also started having this problem in Linux a few days ago, too. Installed a Sapphire RX 6600 Pulse, run stock Ubuntu 22.04. This might have already potentially been fixed in the kernel, see this discussion:

https://lore.kernel.org/all/dbadfe41-24bf-5811-cf38-74973df45214@badpenguin.co.uk/

As for my $.02, $ uname -a yields:

Linux buttran 5.15.0-33-generic #34-Ubuntu SMP Wed May 18 13:34:26 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Relevant output from command $ journalctl -o short-precise -k -b -1 (slightly edited due to snap apparmor vomit):


maj 29 14:30:45.472809 buttran kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
maj 29 14:30:47.780810 buttran kernel: snd_hda_intel 0000:03:00.1: refused to change power state from D3hot to D0
maj 29 14:30:47.881686 buttran kernel: snd_hda_intel 0000:03:00.1: CORB reset timeout#2, CORBRP = 65535
maj 29 14:30:48.164848 buttran kernel: snd_hda_codec_hdmi hdaudioC0D0: Unable to sync register 0x2f0d00. -5
maj 29 14:30:53.040809 buttran kernel: [drm:amdgpu_dm_commit_planes [amdgpu]] *ERROR* Waiting for fences timed out!
maj 29 14:30:53.040950 buttran kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
maj 29 14:30:55.592803 buttran kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout, signaled seq=8985, emitted seq=8987
maj 29 14:30:55.592917 buttran kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
maj 29 14:30:55.592945 buttran kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset begin!
maj 29 14:31:00.248815 buttran kernel: amdgpu 0000:03:00.0: amdgpu: Failed to disable gfxoff!
maj 29 14:31:05.396825 buttran kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
maj 29 14:31:05.676846 buttran kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
maj 29 14:31:10.572822 buttran kernel: amdgpu 0000:03:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
maj 29 14:31:10.573170 buttran kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KGQ disable failed
maj 29 14:31:10.844809 buttran kernel: amdgpu 0000:03:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
maj 29 14:31:10.845113 buttran kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed
maj 29 14:31:15.504813 buttran kernel: amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command!
maj 29 14:31:15.505278 buttran kernel: amdgpu 0000:03:00.0: amdgpu: Failed to disable smu features.
maj 29 14:31:15.505567 buttran kernel: amdgpu 0000:03:00.0: amdgpu: Fail to disable dpm features!
maj 29 14:31:15.505834 buttran kernel: [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <smu> failed -62
maj 29 14:31:15.528826 buttran kernel: [drm] free PSP TMR buffer
maj 29 14:31:16.624812 buttran kernel: [drm] psp gfx command DESTROY_TMR(0x7) failed and response status is (0x80000306)
maj 29 14:31:16.644825 buttran kernel: amdgpu 0000:03:00.0: amdgpu: MODE1 reset
maj 29 14:31:16.645321 buttran kernel: amdgpu 0000:03:00.0: amdgpu: GPU mode1 reset
maj 29 14:31:16.645556 buttran kernel: amdgpu 0000:03:00.0: amdgpu: GPU smu mode1 reset
maj 29 14:31:21.324822 buttran kernel: amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command!
maj 29 14:31:21.325175 buttran kernel: amdgpu 0000:03:00.0: amdgpu: GPU mode1 reset failed
maj 29 14:31:21.325412 buttran kernel: amdgpu 0000:03:00.0: amdgpu: ASIC reset failed with error, -62 for drm dev, 0000:03:00.0
maj 29 14:31:32.313042 buttran kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset succeeded, trying to resume
maj 29 14:31:32.313591 buttran kernel: [drm] PCIE GART of 512M enabled (table at 0x0000008000300000).
maj 29 14:31:32.313647 buttran kernel: [drm] VRAM is lost due to GPU reset!
maj 29 14:31:32.313696 buttran kernel: [drm] PSP is resuming...
maj 29 14:31:33.432819 buttran kernel: [drm] failed to load ucode SMC(0x18) 
maj 29 14:31:33.433024 buttran kernel: [drm] psp gfx command LOAD_IP_FW(0x6) failed and response status is (0x80000306)
maj 29 14:31:33.433094 buttran kernel: [drm] reserve 0xa00000 from 0x81fe000000 for PSP TMR
maj 29 14:31:33.704818 buttran kernel: amdgpu 0000:03:00.0: amdgpu: RAS: optional ras ta ucode is not available
maj 29 14:31:33.724824 buttran kernel: amdgpu 0000:03:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
maj 29 14:31:33.725379 buttran kernel: amdgpu 0000:03:00.0: amdgpu: SMU is resuming...
maj 29 14:31:33.725741 buttran kernel: amdgpu 0000:03:00.0: amdgpu: smu driver if version = 0x0000000f, smu fw if version = 0x00000013, smu fw version = 0x003b2800 (59.40.0)
maj 29 14:31:33.726087 buttran kernel: amdgpu 0000:03:00.0: amdgpu: SMU driver if version not matched
maj 29 14:31:38.420820 buttran kernel: amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command!
maj 29 14:31:38.421166 buttran kernel: amdgpu 0000:03:00.0: amdgpu: Failed to SetDriverDramAddr!
maj 29 14:31:38.421336 buttran kernel: amdgpu 0000:03:00.0: amdgpu: Failed to setup smc hw!
maj 29 14:31:38.421499 buttran kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <smu> failed -62
maj 29 14:31:38.421526 buttran kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset(1) failed
maj 29 14:31:38.440804 buttran kernel: snd_hda_intel 0000:03:00.1: refused to change power state from D3hot to D0
maj 29 14:31:38.541097 buttran kernel: snd_hda_intel 0000:03:00.1: CORB reset timeout#2, CORBRP = 65535
maj 29 14:31:38.541573 buttran kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset end with ret = -62
maj 29 14:31:48.584822 buttran kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout, signaled seq=8987, emitted seq=8987
maj 29 14:31:48.585020 buttran kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
maj 29 14:31:48.585079 buttran kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset begin!
maj 29 14:35:31.556820 buttran kernel: INFO: task kworker/1:0:41074 blocked for more than 120 seconds.
maj 29 14:35:31.557014 buttran kernel:       Not tainted 5.15.0-33-generic #34-Ubuntu
maj 29 14:35:31.557062 buttran kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
maj 29 14:35:31.557102 buttran kernel: task:kworker/1:0     state:D stack:    0 pid:41074 ppid:     2 flags:0x00004000
maj 29 14:35:31.557152 buttran kernel: Workqueue: events drm_sched_job_timedout [gpu_sched]
maj 29 14:35:31.557185 buttran kernel: Call Trace:
maj 29 14:35:31.557218 buttran kernel:  <TASK>
maj 29 14:35:31.557248 buttran kernel:  __schedule+0x23d/0x590
maj 29 14:35:31.557278 buttran kernel:  schedule+0x4e/0xb0
maj 29 14:35:31.557308 buttran kernel:  schedule_timeout+0xfb/0x140
maj 29 14:35:31.557344 buttran kernel:  ? task_rq_lock+0x5f/0x150
maj 29 14:35:31.557377 buttran kernel:  dma_fence_default_wait+0x1c4/0x1f0
maj 29 14:35:31.557417 buttran kernel:  ? dma_fence_free+0x20/0x20
maj 29 14:35:31.557447 buttran kernel:  dma_fence_wait_timeout+0xb7/0xd0
maj 29 14:35:31.557483 buttran kernel:  drm_sched_stop+0xfc/0x170 [gpu_sched]
maj 29 14:35:31.557520 buttran kernel:  amdgpu_device_gpu_recover.cold+0x85a/0x8f8 [amdgpu]
maj 29 14:35:31.557555 buttran kernel:  amdgpu_job_timedout+0x14f/0x170 [amdgpu]
maj 29 14:35:31.557584 buttran kernel:  drm_sched_job_timedout+0x6f/0x110 [gpu_sched]
maj 29 14:35:31.557614 buttran kernel:  process_one_work+0x22b/0x3d0
maj 29 14:35:31.557712 buttran kernel:  worker_thread+0x53/0x410
maj 29 14:35:31.557747 buttran kernel:  ? process_one_work+0x3d0/0x3d0
maj 29 14:35:31.557783 buttran kernel:  kthread+0x12a/0x150
maj 29 14:35:31.557823 buttran kernel:  ? set_kthread_struct+0x50/0x50
maj 29 14:35:31.557857 buttran kernel:  ret_from_fork+0x22/0x30
maj 29 14:35:31.557890 buttran kernel:  </TASK>```
KCCat commented 1 year ago

I have a similar situation. [drm:amdgpu_dm_commit_planes [amdgpu]] *ERROR* Waiting for fences timed out! I'm using 6700xt. I found that the source of the problem is probably here. https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c#L9292 It looks like a deadlock prevention code, and is hardcoded with a timeout of 5000 ms. So I guess using this amdgpu.lockup_timeout=4990 (less than 5000 ms) kernel parameter might be able to get around this problem. I've been testing this for 3 days and it hasn't happened again, so I hope it works for you :) @cjfgraff

pktiuk commented 7 months ago

I also had similar problem. My laptop freezed and even REISUB was not helping

I have similar report in logs:

14:16:27.894416 pop-os kernel: INFO: task kworker/u32:11:8569 blocked for more than 120 seconds.
14:16:27.902946 pop-os kernel:       Tainted: P           OE      6.5.6-76060506-generic #202310061235~1697396945~22.04~9283e32
14:16:27.903001 pop-os kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
14:16:27.903034 pop-os kernel: task:kworker/u32:11  state:D stack:0     pid:8569  ppid:2      flags:0x00004000
14:16:27.903069 pop-os kernel: Workqueue: events_unbound commit_work [drm_kms_helper]
14:16:27.903098 pop-os kernel: Call Trace:
14:16:27.903124 pop-os kernel:  <TASK>
14:16:27.903153 pop-os kernel:  __schedule+0x2cc/0x750
14:16:27.903181 pop-os kernel:  schedule+0x63/0x110
14:16:27.903209 pop-os kernel:  schedule_timeout+0x157/0x170
14:16:27.903238 pop-os kernel:  dma_fence_default_wait+0x13d/0x210
14:16:27.903272 pop-os kernel:  ? __pfx_dma_fence_default_wait_cb+0x10/0x10
14:16:27.903301 pop-os kernel:  dma_fence_wait_timeout+0x116/0x140
14:16:27.903330 pop-os kernel:  drm_atomic_helper_wait_for_fences+0x172/0x200 [drm_kms_helper]
14:16:27.903359 pop-os kernel:  ? srso_alias_return_thunk+0x5/0x7f
14:16:27.903387 pop-os kernel:  commit_tail+0x3c/0x190 [drm_kms_helper]
14:16:27.903415 pop-os kernel:  ? __schedule+0x2d4/0x750
14:16:27.903443 pop-os kernel:  commit_work+0x12/0x20 [drm_kms_helper]
14:16:27.903471 pop-os kernel:  process_one_work+0x240/0x450
14:16:27.903499 pop-os kernel:  worker_thread+0x50/0x3f0
14:16:27.903528 pop-os kernel:  ? __pfx_worker_thread+0x10/0x10
14:16:27.903561 pop-os kernel:  kthread+0xf2/0x120
14:16:27.903590 pop-os kernel:  ? __pfx_kthread+0x10/0x10
14:16:27.903618 pop-os kernel:  ret_from_fork+0x47/0x70
14:16:27.903646 pop-os kernel:  ? __pfx_kthread+0x10/0x10
14:16:27.903669 pop-os kernel:  ret_from_fork_asm+0x1b/0x30
14:16:27.903698 pop-os kernel:  </TASK>
14:16:27.903723 pop-os kernel: INFO: task kworker/u32:13:8570 blocked for more than 120 seconds.
14:16:27.903754 pop-os kernel:       Tainted: P           OE      6.5.6-76060506-generic #202310061235~1697396945~22.04~9283e32
14:16:27.903778 pop-os kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
14:16:27.903803 pop-os kernel: task:kworker/u32:13  state:D stack:0     pid:8570  ppid:2      flags:0x00004000
14:16:27.903832 pop-os kernel: Workqueue: events_unbound commit_work [drm_kms_helper]
14:16:27.903859 pop-os kernel: Call Trace:
14:16:27.903883 pop-os kernel:  <TASK>
14:16:27.903908 pop-os kernel:  __schedule+0x2cc/0x750
14:16:27.903931 pop-os kernel:  schedule+0x63/0x110
14:16:27.903951 pop-os kernel:  schedule_timeout+0x157/0x170
14:16:27.903975 pop-os kernel:  dma_fence_default_wait+0x13d/0x210
14:16:27.904000 pop-os kernel:  ? __pfx_dma_fence_default_wait_cb+0x10/0x10
14:16:27.904028 pop-os kernel:  dma_fence_wait_timeout+0x116/0x140
14:16:27.904051 pop-os kernel:  drm_atomic_helper_wait_for_fences+0x172/0x200 [drm_kms_helper]
14:16:27.904074 pop-os kernel:  ? srso_alias_return_thunk+0x5/0x7f
14:16:27.904103 pop-os kernel:  commit_tail+0x3c/0x190 [drm_kms_helper]
14:16:27.904127 pop-os kernel:  ? __schedule+0x2d4/0x750
14:16:27.904152 pop-os kernel:  commit_work+0x12/0x20 [drm_kms_helper]
14:16:27.904174 pop-os kernel:  process_one_work+0x240/0x450
14:16:27.904197 pop-os kernel:  worker_thread+0x50/0x3f0
14:16:27.904220 pop-os kernel:  ? srso_alias_return_thunk+0x5/0x7f
14:16:27.904243 pop-os kernel:  ? __pfx_worker_thread+0x10/0x10
14:16:27.904267 pop-os kernel:  kthread+0xf2/0x120
14:16:27.904291 pop-os kernel:  ? __pfx_kthread+0x10/0x10
14:16:27.904310 pop-os kernel:  ret_from_fork+0x47/0x70
14:16:27.904333 pop-os kernel:  ? __pfx_kthread+0x10/0x10
14:16:27.904362 pop-os kernel:  ret_from_fork_asm+0x1b/0x30
14:16:27.904387 pop-os kernel:  </TASK>

My machine: Asus ROG Flow X13 2022, AMD Ryzen™ 7 6800HS, RTX 3050Ti.

Should I report this problem anywhere else?