pop-os / mesa

Forked from https://salsa.debian.org/xorg-team/lib/mesa
5 stars 3 forks source link

System hangs since 22.2.0 #9

Open akdor1154 opened 1 year ago

akdor1154 commented 1 year ago

Hey, recently my Pop 22.04 system upgraded Mesa libraries from 22.0.5-0ubuntu0.1 to 22.2.0-1pop0~1664294850~22.04~4e1b64f.

Since this, my system seems to nearly-hang (gnome-shell frames start to take about 2min to refresh and eventually cease completely, need SysRq reset) on certain GPU operations. I can trigger it reliably by using either Firefox or MS Edge to visit a site with a kepler.gl map (e.g. https://kepler.gl/demo )

I got kernel and linux-firmware upgrades at the same, however I've found that rolling back those make no difference, however rolling back the mesa libs fixes the system hang.

journalctl log leading up to hang:

Oct 26 10:28:01 PF3FYVGK-jwhitaker kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:849f7c04, in CanvasRenderer [5692]
Oct 26 10:28:01 PF3FYVGK-jwhitaker kernel: i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0
Oct 26 10:28:01 PF3FYVGK-jwhitaker kernel: i915 0000:00:02.0: [drm] CanvasRenderer[5692] context reset due to GPU hang
Oct 26 10:28:01 PF3FYVGK-jwhitaker firefox.desktop[5582]: WebGL(0x7f89a77f8d00)::LoseContext(2)
Oct 26 10:28:01 PF3FYVGK-jwhitaker kernel: i915 0000:00:02.0: [drm] GuC firmware i915/adlp_guc_62.0.3.bin version 62.0 submission:enabled
Oct 26 10:28:01 PF3FYVGK-jwhitaker kernel: i915 0000:00:02.0: [drm] GuC SLPC: enabled
Oct 26 10:28:01 PF3FYVGK-jwhitaker kernel: i915 0000:00:02.0: [drm] HuC firmware i915/tgl_huc_7.9.3.bin version 7.9 authenticated:yes
Oct 26 10:28:01 PF3FYVGK-jwhitaker gnome-shell[2379]: Removing a network device that was not added
Oct 26 10:28:01 PF3FYVGK-jwhitaker gnome-shell[2379]: Removing a network device that was not added
Oct 26 10:28:01 PF3FYVGK-jwhitaker gnome-shell[2379]: Removing a network device that was not added
Oct 26 10:28:01 PF3FYVGK-jwhitaker gnome-shell[2379]: Removing a network device that was not added
Oct 26 10:28:01 PF3FYVGK-jwhitaker gnome-shell[2379]: Removing a network device that was not added
Oct 26 10:28:01 PF3FYVGK-jwhitaker gnome-shell[2379]: Removing a network device that was not added
Oct 26 10:28:01 PF3FYVGK-jwhitaker gnome-shell[2379]: Removing a network device that was not added
Oct 26 10:28:01 PF3FYVGK-jwhitaker gnome-shell[2379]: Removing a network device that was not added
Oct 26 10:28:01 PF3FYVGK-jwhitaker gnome-shell[2379]: Removing a network device that was not added
Oct 26 10:28:01 PF3FYVGK-jwhitaker firefox.desktop[5582]: [GFX1-]: GFX: RenderThread detected a device reset in PostUpdate
Oct 26 10:28:02 PF3FYVGK-jwhitaker avahi-daemon[1200]: Joining mDNS multicast group on interface veth73815a7.IPv6 with address fe80::e800:a9ff:fedc:85ab.
Oct 26 10:28:02 PF3FYVGK-jwhitaker avahi-daemon[1200]: New relevant interface veth73815a7.IPv6 for mDNS.
Oct 26 10:28:02 PF3FYVGK-jwhitaker avahi-daemon[1200]: Registering new address record for fe80::e800:a9ff:fedc:85ab on veth73815a7.*.
Oct 26 10:28:08 PF3FYVGK-jwhitaker systemd[1]: cbagentd.service: Scheduled restart job, restart counter is at 5.
Oct 26 10:28:08 PF3FYVGK-jwhitaker systemd[1]: Stopped Carbon Black Predictive Security Cloud Endpoint Agent..
Oct 26 10:28:08 PF3FYVGK-jwhitaker systemd[1]: Started Carbon Black Predictive Security Cloud Endpoint Agent..
Oct 26 10:28:08 PF3FYVGK-jwhitaker systemd[1]: cbagentd.service: Main process exited, code=exited, status=1/FAILURE
Oct 26 10:28:08 PF3FYVGK-jwhitaker systemd[1]: cbagentd.service: Failed with result 'exit-code'.
Oct 26 10:28:08 PF3FYVGK-jwhitaker kernel: i915 0000:00:02.0: [drm] *ERROR* mstb 000000006865ab0c port 1: DPCD read on addr 0x4b0 for 1 bytes NAKed
Oct 26 10:28:08 PF3FYVGK-jwhitaker kernel: i915 0000:00:02.0: [drm] *ERROR* mstb 000000006865ab0c port 3: DPCD read on addr 0x4b0 for 1 bytes NAKed

If I hold the packages at amd64=22.0.5-0ubuntu0.1 then my system becomes stable again.

I'm using Intel 12th gen (yeah :( ) i5-1240P on a Lenovo X1 Yoga gen 7.

akdor1154 commented 1 year ago

seems to be this guy: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7755

as per linked issue, running with INTEL_DEBUG=no32 works as a workaround. Mesa changelogs reckon this is fixed in https://docs.mesa3d.org/relnotes/22.3.1.html - any chance of packaging this?

13r0ck commented 1 year ago

Please let me know if #11 improves the situation for you. (After it builds, which should be in the next few days at latest)