Closed jackpot51 closed 9 months ago
I'm sometimes seeing some UBSAN shift-out-of-bounds messages when resuming from suspend:
Sep 26 15:42:19 pop-os kernel: ACPI: PM: Preparing to enter system sleep state S3
Sep 26 15:42:19 pop-os kernel: ACPI: EC: event blocked
Sep 26 15:42:19 pop-os kernel: ACPI: EC: EC stopped
Sep 26 15:42:19 pop-os kernel: ACPI: PM: Saving platform NVS memory
Sep 26 15:42:19 pop-os kernel: Disabling non-boot CPUs ...
Sep 26 15:42:19 pop-os kernel: smpboot: CPU 1 is now offline
Sep 26 15:42:19 pop-os kernel: smpboot: CPU 2 is now offline
Sep 26 15:42:19 pop-os kernel: smpboot: CPU 3 is now offline
Sep 26 15:42:19 pop-os kernel: smpboot: CPU 4 is now offline
Sep 26 15:42:19 pop-os kernel: smpboot: CPU 5 is now offline
Sep 26 15:42:19 pop-os kernel: smpboot: CPU 6 is now offline
Sep 26 15:42:19 pop-os kernel: smpboot: CPU 7 is now offline
Sep 26 15:42:19 pop-os kernel: smpboot: CPU 8 is now offline
Sep 26 15:42:19 pop-os kernel: smpboot: CPU 9 is now offline
Sep 26 15:42:19 pop-os kernel: smpboot: CPU 10 is now offline
Sep 26 15:42:19 pop-os kernel: smpboot: CPU 11 is now offline
Sep 26 15:42:19 pop-os kernel: smpboot: CPU 12 is now offline
Sep 26 15:42:19 pop-os kernel: smpboot: CPU 13 is now offline
Sep 26 15:42:19 pop-os kernel: smpboot: CPU 14 is now offline
Sep 26 15:42:19 pop-os kernel: smpboot: CPU 15 is now offline
Sep 26 15:42:19 pop-os kernel: smpboot: CPU 16 is now offline
Sep 26 15:42:19 pop-os kernel: smpboot: CPU 17 is now offline
Sep 26 15:42:19 pop-os kernel: smpboot: CPU 18 is now offline
Sep 26 15:42:19 pop-os kernel: smpboot: CPU 19 is now offline
Sep 26 15:42:19 pop-os kernel: ACPI: PM: Low-level resume complete
Sep 26 15:42:19 pop-os kernel: ACPI: EC: EC started
Sep 26 15:42:19 pop-os kernel: ACPI: PM: Restoring platform NVS memory
Sep 26 15:42:19 pop-os kernel: Enabling non-boot CPUs ...
Sep 26 15:42:19 pop-os kernel: smpboot: Booting Node 0 Processor 1 APIC 0x1
Sep 26 15:42:19 pop-os kernel: CPU1 is up
Sep 26 15:42:19 pop-os kernel: smpboot: Booting Node 0 Processor 2 APIC 0x8
Sep 26 15:42:19 pop-os kernel: CPU2 is up
Sep 26 15:42:19 pop-os kernel: smpboot: Booting Node 0 Processor 3 APIC 0x9
Sep 26 15:42:19 pop-os kernel: CPU3 is up
Sep 26 15:42:19 pop-os kernel: smpboot: Booting Node 0 Processor 4 APIC 0x10
Sep 26 15:42:19 pop-os kernel: CPU4 is up
Sep 26 15:42:19 pop-os kernel: smpboot: Booting Node 0 Processor 5 APIC 0x11
Sep 26 15:42:19 pop-os kernel: CPU5 is up
Sep 26 15:42:19 pop-os kernel: smpboot: Booting Node 0 Processor 6 APIC 0x18
Sep 26 15:42:19 pop-os kernel: CPU6 is up
Sep 26 15:42:19 pop-os kernel: smpboot: Booting Node 0 Processor 7 APIC 0x19
Sep 26 15:42:19 pop-os kernel: CPU7 is up
Sep 26 15:42:19 pop-os kernel: smpboot: Booting Node 0 Processor 8 APIC 0x20
Sep 26 15:42:19 pop-os kernel: CPU8 is up
Sep 26 15:42:19 pop-os kernel: smpboot: Booting Node 0 Processor 9 APIC 0x21
Sep 26 15:42:19 pop-os kernel: CPU9 is up
Sep 26 15:42:19 pop-os kernel: smpboot: Booting Node 0 Processor 10 APIC 0x28
Sep 26 15:42:19 pop-os kernel: CPU10 is up
Sep 26 15:42:19 pop-os kernel: smpboot: Booting Node 0 Processor 11 APIC 0x29
Sep 26 15:42:19 pop-os kernel: CPU11 is up
Sep 26 15:42:19 pop-os kernel: smpboot: Booting Node 0 Processor 12 APIC 0x30
Sep 26 15:42:19 pop-os kernel: core: cpu_atom PMU driver: PEBS-via-PT
Sep 26 15:42:19 pop-os kernel: ... version: 5
Sep 26 15:42:19 pop-os kernel: ... bit width: 48
Sep 26 15:42:19 pop-os kernel: ... generic registers: 6
Sep 26 15:42:19 pop-os kernel: ... value mask: 0000ffffffffffff
Sep 26 15:42:19 pop-os kernel: ... max period: 00007fffffffffff
Sep 26 15:42:19 pop-os kernel: ... fixed-purpose events: 3
Sep 26 15:42:19 pop-os kernel: ... event mask: 000000070000003f
Sep 26 15:42:19 pop-os kernel: CPU12 is up
Sep 26 15:42:19 pop-os kernel: smpboot: Booting Node 0 Processor 13 APIC 0x32
Sep 26 15:42:19 pop-os kernel: CPU13 is up
Sep 26 15:42:19 pop-os kernel: smpboot: Booting Node 0 Processor 14 APIC 0x34
Sep 26 15:42:19 pop-os kernel: CPU14 is up
Sep 26 15:42:19 pop-os kernel: smpboot: Booting Node 0 Processor 15 APIC 0x36
Sep 26 15:42:19 pop-os kernel: CPU15 is up
Sep 26 15:42:19 pop-os kernel: smpboot: Booting Node 0 Processor 16 APIC 0x38
Sep 26 15:42:19 pop-os kernel: CPU16 is up
Sep 26 15:42:19 pop-os kernel: smpboot: Booting Node 0 Processor 17 APIC 0x3a
Sep 26 15:42:19 pop-os kernel: CPU17 is up
Sep 26 15:42:19 pop-os kernel: smpboot: Booting Node 0 Processor 18 APIC 0x3c
Sep 26 15:42:19 pop-os kernel: CPU18 is up
Sep 26 15:42:19 pop-os kernel: smpboot: Booting Node 0 Processor 19 APIC 0x3e
Sep 26 15:42:19 pop-os kernel: CPU19 is up
Sep 26 15:42:19 pop-os kernel: ACPI: PM: Waking up from system sleep state S3
Sep 26 15:42:19 pop-os kernel: ACPI: EC: interrupt unblocked
Sep 26 15:42:19 pop-os kernel: ACPI: EC: event unblocked
Sep 26 15:42:19 pop-os kernel: pcieport 0000:00:06.2: can't derive routing for PCI INT A
Sep 26 15:42:19 pop-os kernel: nvme 0000:03:00.0: PCI INT A: no GSI - using ISA IRQ 11
Sep 26 15:42:19 pop-os kernel: i915 0000:00:02.0: [drm] GT0: GuC firmware i915/adlp_guc_70.bin version 70.5.1
Sep 26 15:42:19 pop-os kernel: i915 0000:00:02.0: [drm] GT0: HuC firmware i915/tgl_huc.bin version 7.9.3
Sep 26 15:42:19 pop-os kernel: nvme nvme0: 20/0/0 default/read/poll queues
Sep 26 15:42:19 pop-os kernel: i915 0000:00:02.0: [drm] GT0: HuC: authenticated for all workloads
Sep 26 15:42:19 pop-os kernel: i915 0000:00:02.0: [drm] GT0: GUC: submission enabled
Sep 26 15:42:19 pop-os kernel: i915 0000:00:02.0: [drm] GT0: GUC: SLPC enabled
Sep 26 15:42:19 pop-os kernel: i915 0000:00:02.0: [drm] GT0: GUC: RC enabled
Sep 26 15:42:19 pop-os kernel: usb 3-8: reset high-speed USB device number 2 using xhci_hcd
Sep 26 15:42:19 pop-os kernel: ================================================================================
Sep 26 15:42:19 pop-os kernel: UBSAN: shift-out-of-bounds in /build/linux-gELEGM/linux-6.5.4/drivers/gpu/drm/display/drm_dp_mst_topology.c:4416:36
Sep 26 15:42:19 pop-os kernel: shift exponent -1 is negative
Sep 26 15:42:19 pop-os kernel: CPU: 9 PID: 47552 Comm: kworker/9:4 Tainted: P OE 6.5.4-76060504-generic #202309191142~1695387248~22.04~8154eec
Sep 26 15:42:19 pop-os kernel: Hardware name: System76 Oryx Pro/Oryx Pro, BIOS 2023-06-08_36c78ea 06/08/2023
Sep 26 15:42:19 pop-os kernel: Workqueue: i915-unordered intel_fbdev_suspend_worker [i915]
Sep 26 15:42:19 pop-os kernel: Call Trace:
Sep 26 15:42:19 pop-os kernel: <TASK>
Sep 26 15:42:19 pop-os kernel: dump_stack_lvl+0x48/0x70
Sep 26 15:42:19 pop-os kernel: dump_stack+0x10/0x20
Sep 26 15:42:19 pop-os kernel: __ubsan_handle_shift_out_of_bounds+0x1ac/0x360
Sep 26 15:42:19 pop-os kernel: drm_dp_atomic_release_time_slots.cold+0x17/0x3d [drm_display_helper]
Sep 26 15:42:19 pop-os kernel: intel_dp_mst_atomic_check+0xaa/0x180 [i915]
Sep 26 15:42:19 pop-os kernel: ? update_connector_routing+0x2f1/0x3e0 [drm_kms_helper]
Sep 26 15:42:19 pop-os kernel: drm_atomic_helper_check_modeset+0x293/0x5a0 [drm_kms_helper]
Sep 26 15:42:19 pop-os kernel: intel_atomic_check+0xfe/0xb80 [i915]
Sep 26 15:42:19 pop-os kernel: ? drm_plane_check_pixel_format+0x53/0xe0 [drm]
Sep 26 15:42:19 pop-os kernel: drm_atomic_check_only+0x1ac/0x400 [drm]
Sep 26 15:42:19 pop-os kernel: ? update_output_state+0x184/0x1a0 [drm]
Sep 26 15:42:19 pop-os kernel: drm_atomic_commit+0x58/0xd0 [drm]
Sep 26 15:42:19 pop-os kernel: ? __pfx___drm_printfn_info+0x10/0x10 [drm]
Sep 26 15:42:19 pop-os kernel: drm_client_modeset_commit_atomic+0x203/0x240 [drm]
Sep 26 15:42:19 pop-os kernel: drm_client_modeset_commit_locked+0x5b/0x170 [drm]
Sep 26 15:42:19 pop-os kernel: drm_client_modeset_commit+0x26/0x50 [drm]
Sep 26 15:42:19 pop-os kernel: __drm_fb_helper_restore_fbdev_mode_unlocked+0xc2/0x100 [drm_kms_helper]
Sep 26 15:42:19 pop-os kernel: drm_fb_helper_hotplug_event+0x10b/0x120 [drm_kms_helper]
Sep 26 15:42:19 pop-os kernel: intel_fbdev_set_suspend+0x10a/0x220 [i915]
Sep 26 15:42:19 pop-os kernel: intel_fbdev_suspend_worker+0x1c/0x30 [i915]
Sep 26 15:42:19 pop-os kernel: process_one_work+0x23d/0x450
Sep 26 15:42:19 pop-os kernel: worker_thread+0x50/0x3f0
Sep 26 15:42:19 pop-os kernel: ? __pfx_worker_thread+0x10/0x10
Sep 26 15:42:19 pop-os kernel: kthread+0xef/0x120
Sep 26 15:42:19 pop-os kernel: ? __pfx_kthread+0x10/0x10
Sep 26 15:42:19 pop-os kernel: ret_from_fork+0x44/0x70
Sep 26 15:42:19 pop-os kernel: ? __pfx_kthread+0x10/0x10
Sep 26 15:42:19 pop-os kernel: ret_from_fork_asm+0x1b/0x30
Sep 26 15:42:19 pop-os kernel: </TASK>
Sep 26 15:42:19 pop-os kernel: ================================================================================
Sep 26 15:42:19 pop-os kernel: mei_hdcp 0000:00:16.0-b638ab7e-94e2-4ea2-a552-d1c54b627f04: bound 0000:00:02.0 (ops i915_hdcp_ops [i915])
Sep 26 15:42:19 pop-os kernel: mei_pxp 0000:00:16.0-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: bound 0000:00:02.0 (ops i915_pxp_tee_component_ops [i915])
Sep 26 15:42:19 pop-os kernel: OOM killer enabled.
Sep 26 15:42:19 pop-os kernel: Restarting tasks ... done.
This output was on oryp9, but I've also seen this on my Dev One, as well as on a Pangolin. I'm not really seeing any bad behavior accompanying the messages though, so I'm not sure how much of a cause for concern this is.
Also seeing issues with the NVIDIA 470 DKMS module, as well as the ZFS DKMS module. The bcmwl, hp-vendor, virtualbox, NVIDIA 535, and system76 DKMS stuff all seems fine though.
ZFS needs a new release, sadly: https://github.com/openzfs/zfs/pull/15268
This may fix NVIDIA 470: https://github.com/pop-os/nvidia-graphics-drivers-470/pull/17
I got through the rest of my checklist, and the DKMS stuff for NVIDIA 470 and ZFS are the only boxes I can't cross
- [x] `system76-power` still operates as expected (across Intel/NVIDIA/switchable machines)
- [x] Updating to new kernel works with `apt update && apt upgrade`
- [x] No new dependencies are required for kernel update
- [x] Mic in
- [x] Audio out:
- [x] Laptop's built-in speakers
- [x] Headphones
- [x] DisplayPort
- [x] HDMI
- [x] Video out via:
- [x] DisplayPort
- [x] Daisy-chain over DP
- [x] HDMI
- [x] Thunderbolt docking station:
- [x] HDMI and DisplayPort
- [x] External storage device
- [x] Networking
- [x] Suspend and resume:
- [x] Suspend and resume works with a bluetooth device paired
- [x] Switchable graphics laptops
- [x] In Hybrid graphics mode
- [x] In Nvidia graphics mode
- [x] In Integrated graphics mode
- [ ] Nvidia desktop
- [x] Current Nvidia driver works (i.e. 525)
- [ ] Legacy Nvidia driver works (i.e. 470)
- [x] Plymouth decrypt prompt appears as expected on multiple machines (mira-r2 and mega-r2 are good examples)
- [x] Discrete AMD desktop
- [x] Integrated Intel desktop
- [x] Integrated AMD desktop
- [x] 150 suspend/resume cycles (`fwts s3 --s3-multiple 150`)
- [x] Graphics drivers included in the kernel (Intel/AMD)
- [x] Switchable/hybrid graphics and graphics switching
- [x] Steam
- [x] Steam installs via the Pop!_OS .deb in Pop!_Shop
- [x] Steam launches from launcher
- [x] Linux native game installs and runs
- [x] Proton game installs and runs
- [x] VirtualBox installs and works as expected
- `virtualbox-ext-pack` installs as expected
- [x] Broadcom wireless driver (`bcmwl-kernel-source`) installs without DKMS errors
- [x] On Dev One, fan speed is shown in `sensors`
- [ ] All of our maintained DKMS packages are installable:
- [x] NVIDIA
- [ ] ZFS
- [x] System76
- [x] hp-vendor
Patch for ZFS: https://github.com/pop-os/zfs-linux/pull/18
Merging and we can decide when to release on the repo-release project.
Includes all patches from master and revert that may fix ADL-P graphics hang.
Make sure DKMS modules are working: NVIDIA 470, NVIDIA 535 (I tested this already), VirtualBox (I also tested this), and ZFS.