pop-os / pop

A project for managing all Pop!_OS sources
https://system76.com/pop
2.46k stars 87 forks source link

GTX 3060 crashing under load #3093

Open eldomtom2 opened 1 year ago

eldomtom2 commented 1 year ago

Distribution (run cat /etc/os-release): Pop!_OS 22.04 LTS

Issue/Bug Description: My GPU keeps crashing, leading to a computer freeze where I have to hit the power button. It seems to happen only when it's both under load and I alt-tab.

Here's the full log of the last time it happened: https://pastebin.com/UZtExT98

The important section seems to be:

Aug 01 14:58:57 pop-os kernel: NVRM: GPU at PCI:0000:01:00: GPU-9d52038e-1176-961b-5e7d-1af4c1f5c9d1

Aug 01 14:58:57 pop-os kernel: NVRM: Xid (PCI:0000:01:00): 62, pid='', name=, 0000(0000) 00000000 00000000

Aug 01 14:58:57 pop-os kernel: NVRM: Xid (PCI:0000:01:00): 45, pid=322973, name=-w, Ch 00000043

Aug 01 14:58:57 pop-os kernel: NVRM: Xid (PCI:0000:01:00): 45, pid=322973, name=-w, Ch 0000004c

Aug 01 14:58:57 pop-os kernel: NVRM: Xid (PCI:0000:01:00): 45, pid=322973, name=-w, Ch 0000004d

Aug 01 14:58:57 pop-os kernel: NVRM: Xid (PCI:0000:01:00): 45, pid=322973, name=-w, Ch 0000004f

Aug 01 14:59:02 pop-os /usr/libexec/gdm-x-session[1755]: (WW) NVIDIA: Wait for channel idle timed out.

Aug 01 15:00:13 pop-os pipewire[1639]: spa.alsa: front:2: (1 missed) impossible htimestamp diff:398

Aug 01 15:02:54 pop-os /usr/libexec/gdm-x-session[1755]: (WW) NVIDIA: Wait for channel idle timed out.

Aug 01 15:03:07 pop-os gnome-shell[2228]: Window manager warning: Failed to start restart helper: Failed to execute child process “/usr/libexec/mutter-restart-helper” (No such file or directory)

Aug 01 15:03:12 pop-os /usr/libexec/gdm-x-session[1755]: (WW) NVIDIA: Wait for channel idle timed out.

Aug 01 15:03:12 pop-os gnome-shell[2228]: Window manager warning: META_CURRENT_TIME used to choose focus window; focus window may not be correct.

Aug 01 15:03:17 pop-os /usr/libexec/gdm-x-session[1755]: (WW) NVIDIA: Wait for channel idle timed out.

Aug 01 15:03:17 pop-os gnome-shell[2228]: clutter_actor_iter_next: assertion 'ri->age == ri->root->priv->age' failed

Here is my hardware:

OS: Pop!_OS 22.04 LTS x86_64

Host: B660M DS3H AX DDR4

Kernel: 6.2.6-76060206-generic

Packages: 2387 (dpkg), 48 (flatpak)

Shell: bash 5.1.16

Resolution: 1920x1080

DE: GNOME 42.5

Terminal: gnome-terminal

CPU: 13th Gen Intel i5-13500 (20) @ 4

GPU: Intel AlderLake-S GT1

GPU: NVIDIA GeForce RTX 3060

Memory: 14624MiB / 31860MiB

Steps to reproduce (if you know): Get the GPU under heavy load and then try to alt-tab, perhaps? I haven't actively tried to reproduce it.

Expected behavior: The GPU does not crash.

jacobgkau commented 1 year ago

A crash/freeze only under load would typically indicate a hardware issue. If this is System76 hardware, please open a support ticket to start a repair or replacement. If it's third-party hardware, you may need to contact your hardware manufacturer.

jacobgkau commented 1 year ago

If you have reason to believe this is not a hardware issue, knowing whether you can also recreate the issue on Ubuntu and/or any other distros and what exact workload/steps you're using to generate the load would be useful.

yangchenyun commented 1 year ago

I am on oryx7 model with 3060 GPU, when the GPU load gets to 100%, it would also crash. I have tried to reinstall OS, the problem remains.

jacobgkau commented 10 months ago

@yangchenyun I'm sorry to hear about the crashes. If you have System76 hardware, please open a support ticket. The team there can assist with hardware issues as well as software.

Once again, if you think it's not hardware for some reason, testing in a different OS such as Ubuntu would be good to narrow down if this is a Pop!_OS-specific issue or not. More information about how you're generating the load might also be helpful.