pop-os / pop

A project for managing all Pop!_OS sources
https://system76.com/pop
2.44k stars 87 forks source link

System Eventually Freezes Completely #1172

Closed Lobosque closed 2 months ago

Lobosque commented 4 years ago

Distribution (run cat /etc/os-release): pop-os% cat /etc/os-release
NAME="Pop!_OS" VERSION="20.04 LTS" ID=pop ID_LIKE="ubuntu debian" PRETTY_NAME="Pop!_OS 20.04 LTS" VERSION_ID="20.04" HOME_URL="https://pop.system76.com" SUPPORT_URL="https://support.system76.com" BUG_REPORT_URL="https://github.com/pop-os/pop/issues" PRIVACY_POLICY_URL="https://system76.com/privacy" VERSION_CODENAME=focal UBUNTU_CODENAME=focal LOGO=distributor-logo-pop-os

Related Application and/or Package Version (run apt policy $PACKAGE NAME): Installation is pretty fresh. Not sure which package could be related but the first freeze happened without anything installed or updated.

Issue/Bug Description: System freezes completely, even after updating kernel and all packages (including NVidia drivers) to the newest version I installed PopOS today during the morning and got about 5 freezes already.

Steps to reproduce (if you know): I'm trying to monitor cpu/ram usage but couldn't find any correlation with resource usage increase and the freezes.

Expected behavior: Not freeze

Other Notes: Fresh instalation using the ISO with NVidia drivers. My laptop is a UX430UN

Oymate commented 4 years ago

This one is also annoying greatly

mmstick commented 4 years ago

Does it happen only when Firefox or Chrome are open?

Oymate commented 4 years ago

@mmstick It happened during opening tabs, audio editing, video editing( and rendering). Anything going a little hard does this.

Oymate commented 4 years ago

@mmstick Could this be an overload in syslog? Especially in journal.

mmstick commented 4 years ago

It could be excessive logging.

caiohenrique12 commented 4 years ago

i have the same issue :(

Oymate commented 4 years ago

There is a lot of post relating this issue on reddit too. Ex- https://www.reddit.com/r/pop_os/comments/ivgeyd/pop_os_keeps_freezing/

ibrahimovnijat commented 4 years ago

I have this issue as well. In the last 2-3 days, once a day, OS freezes. It does not respond to keyboard or mouse, so I have to hard-reboot. Happens when I am not even doing anything computationally intensive.

Adam-Kadmon commented 4 years ago

I've had the same problem for a couple of weeks or more. Severity is increasing, freezes now require multiple hard restarts to clear. fsck found and fixed many errors on affected nvme, SMART log shows no problems. Everything is up to date and running nvidia-driver-450 as of yesterday (didn't resolve issue). No apparent correlation with load, most lock ups occur at minimal load and ~30% RAM usage (i.e. during office tasks).

More posts are showing up on Reddit (this isn't mine but it describes my problem exactly: https://www.reddit.com/r/pop_os/comments/ixqjel/pop_os_freezes/

mmstick commented 4 years ago

Does it improve if you kill io.elementary.appcenter?

Oymate commented 4 years ago

@mmstick As soon as I killed it fan stopped whining that's probably the culprit.

Oymate commented 4 years ago

https://www.reddit.com/r/pop_os/search/?q=freez&sort=new&restrict_sr=on

jenabaivab commented 4 years ago

Facing the same issue, I thought it was due to VLC, because it usually starts freezing after around 30 minutes I start watching any movie.

yogthos commented 4 years ago

I've noticed that for me the issue reliably occurs after I put the laptop to sleep and wake it up. However, it doesn't occur immediately after. It can be a few minutes of working normally before the UI freezes after it wakes up. Switching from the proprietary nvidia driver appears to help, however I saw a freeze with nouveau driver as well one time. If I turn the machine off completely and start it up after, I don't experience the freezes. My system info is:

NAME="Pop!_OS"
VERSION="20.04 LTS"
ID=pop
ID_LIKE="ubuntu debian"
PRETTY_NAME="Pop!_OS 20.04 LTS"
VERSION_ID="20.04"
HOME_URL="https://pop.system76.com"
SUPPORT_URL="https://support.system76.com"
BUG_REPORT_URL="https://github.com/pop-os/pop/issues"
PRIVACY_POLICY_URL="https://system76.com/privacy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
LOGO=distributor-logo-pop-os

The hardware is Dell Precision 5520 with an external monitor over USBC

Adam-Kadmon commented 3 years ago

@mmstick I don't think you were talking to me, but since I can't seem to stop these lock-ups, I took what you suggested re: the io.elementary.appcenter and very much ran with it, doing:

sudo rm -rfv /usr/share/applications/io.elementary.appcenter-daemon.desktop
sudo rm -rfv /etc/xdg/autostart/io.elementary.appcenter-daemon.desktop

Based on something I read on reddit (yeah, I'm over these crashes, they're incredibly stressful on a work machine).

My NVMe got so messed up by a freeze a week ago that I had to clean it with diskpart before a Linux system could even look at it without locking up the controller (and necessitating a re-seat before even Windows could see it).

I've been passing a parameter to the kernel, setting max latency on the NVMe to zero in order to disable APST. I've also been confirming that APST is in fact disabled using nvme-cli. For six days I thought this was working, but then two days ago and today I experienced further lock-ups.

Just a couple of minutes ago I removed those appcenter daemons. We'll see how that goes. I'll report back.

Adam-Kadmon commented 3 years ago

Another report on Reddit - this seems to be exactly what I'm dealing with:

https://www.reddit.com/r/pop_os/comments/j7la5n/laptop_randomly_freezing/

NicholasMamo commented 3 years ago

Hey, I'm the guy from the Reddit thread that @Adam-Kadmon posted. Adding more info here.

My laptop is practically brand new (Acer Nitro 5), got it last month. I have an NVIDIA graphics card (1660 Ti) and running on NVMe. The Nvidia graphics card driver is 455.28.

Since 2 weeks now, my laptop (not System76, but running Pop!_OS) has been freezing completely at random points. It usually happens once every 3 or 4 days, so it's not that frequent. When I say freeze, I mean complete freeze: the keyboard backlight, which usually turns off after 30 seconds of inactivity, remains on indefinitely.

Since the two times it has happened, I started keeping a sort of log about what I was doing. I am using Hybrid Graphics. All of these crashes happened at the end of the day (between 2000 and 2300). I say this because during this time I'm usually playing video games (no crashes then) or watching videos on VLC, or football matches online, which is strange: I'm doing more computationally-intensive stuff during work hours.

4 October: I had been using the laptop all day. I had suspended it for around an hour. After 30 minutes of being awake, I had 2 Firefox windows open: on one there were Messenger and Twitter open (pinned tabs), and on the other was a football stream I had just opened. 8 October: The laptop had been on for around 3 hours. I had played some video games, and then started watching a couple of episodes on VLC (around an hour). I think it was a few seconds after I closed VLC (and switched to Firefox: Messenger and Twitter) that the laptop froze.

Last time the laptop I waited 5 minutes, and the laptop remained frozen. I had been using for around 3 hours. These are the last entries in /var/log/syslog before the crash, and they seem pretty innocuous:

Oct  8 22:36:06 pop-os systemd[1]: Starting
Daily apt download activities...
Oct  8 22:36:06 pop-os systemd[1]: apt-daily.service: Succeeded.
Oct  8 22:36:06 pop-os systemd[1]: Finished Daily apt download activities.
Oct  8 22:37:07 pop-os xdg-desktop-por[8948]: Failed to create foreign window for XID 0
Oct  8 22:37:07 pop-os xdg-desktop-por[8948]: Failed to associate portal window with parent window x11:0
Oct  8 22:37:07 pop-os dbus-daemon[791]: [system] Activating via systemd: service name='org.freedesktop.hostname1' unit='dbus-org.freedesktop.hostname1.service' requested by ':1.228' (uid=1000 pid=8948 comm="/usr/libexec/xdg-desktop-portal-gtk " label="unconfined")
Oct  8 22:37:07 pop-os systemd[1]: Starting Hostname Service...
Oct  8 22:37:07 pop-os dbus-daemon[791]: [system] Successfully activated service 'org.freedesktop.hostname1'
Oct  8 22:37:07 pop-os systemd[1]: Started Hostname Service.
Oct  8 22:37:13 pop-os org.gnome.Nautilus[8926]: [00007f187cc10660] avcodec decoder: Using Intel iHD driver - 1.0.0 for hardware decoding
Oct  8 22:37:17 pop-os gnome-shell[1866]: Window manager warning: WM_TRANSIENT_FOR window 0x440646a for 0x4409466 window override-redirect is an override-redirect window and this is not correct according to the standard, so we'll fallback to the first non-override-redirect window 0x4400006.
Oct  8 22:37:25 pop-os org.gnome.Nautilus[8926]: QObject::~QObject: Timers cannot be stopped from another thread
Oct  8 22:37:25 pop-os systemd[1588]: flatpak-org.videolan.VLC-8917.scope: Succeeded.
Oct  8 22:37:26 pop-os dbus-daemon[1613]: [session uid=1000 pid=1613] Activating service name='org.gnome.Nautilus' requested by ':1.47' (uid=1000 pid=1866 comm="/usr/bin/gnome-shell " label="unconfined")
Oct  8 22:37:26 pop-os dbus-daemon[1613]: 
[session uid=1000 pid=1613] Successfully activated service 'org.gnome.Nautilus'

The last lines in journalctl are the same, I think:

Oct 08 22:35:47 pop-os wpa_supplicant[833]: wlp0s20f3: WPA: Group rekeying completed with a6:91:b1:89:9c:b4 [GTK=TKIP]
Oct 08 22:36:06 pop-os systemd[1]: Starting Daily apt download activities...
Oct 08 22:36:06 pop-os systemd[1]: apt-daily.service: Succeeded.
Oct 08 22:36:06 pop-os systemd[1]: Finished Daily apt download activities.
Oct 08 22:37:07 pop-os xdg-desktop-por[8948]: Failed to create foreign window for XID 0
Oct 08 22:37:07 pop-os xdg-desktop-por[8948]: Failed to associate portal window with parent window x11:0
Oct 08 22:37:07 pop-os dbus-daemon[791]: [system] Activating via systemd: service name='org.freedesktop.hostname1' unit='dbus-org.freedesktop.hostname1.service' requested by ':1.228' (uid=10
00 pid=8948 comm="/usr/libexec/xdg-desktop-portal-gtk " label="unconfined")
Oct 08 22:37:07 pop-os systemd[1]: Starting Hostname Service...
Oct 08 22:37:07 pop-os dbus-daemon[791]: [system] Successfully activated service 'org.freedesktop.hostname1'
Oct 08 22:37:07 pop-os systemd[1]: Started Hostname Service.
Oct 08 22:37:13 pop-os org.gnome.Nautilus[8926]: [00007f187cc10660] avcodec decoder: Using Intel iHD driver - 1.0.0 for hardware decoding
Oct 08 22:37:17 pop-os gnome-shell[1866]: Window manager warning: WM_TRANSIENT_FOR window 0x440646a for 0x4409466 window override-redirect is an override-redirect window and this is not corr
ect according to the standard, so we'll fallback to the first non-override-redirect window 0x4400006.
Oct 08 22:37:25 pop-os org.gnome.Nautilus[8926]: QObject::~QObject: Timers cannot be stopped from another thread
Oct 08 22:37:25 pop-os systemd[1588]: flatpak-org.videolan.VLC-8917.scope: Succeeded.
Oct 08 22:37:26 pop-os dbus-daemon[1613]: [session uid=1000 pid=1613] Activating service name='org.gnome.Nautilus' requested by ':1.47' (uid=1000 pid=1866 comm="/usr/bin/gnome-shell " label=
"unconfined")
Oct 08 22:37:26 pop-os dbus-daemon[1613]: [session uid=1000 pid=1613] Successfully activated service 'org.gnome.Nautilus'
-- Reboot --
Oct 08 22:42:05 pop-os kernel: Linux version 5.4.0-7642-generic (buildd@lcy01-amd64-007) (gcc version 9.3.0 (Ubuntu 9.3.0-10ubuntu2)) #46~1598628707~20.04~040157c-Ubuntu SMP Fri Aug 28 18:02
:16 UTC  (Ubuntu 5.4.0-7642.46~1598628707~20.04~040157c-generic 5.4.44)

I'm not including anything before that timestamp because the previous log is around 10 minutes before the freeze.

What I'm doing now:

Since the appcenter daemons are not a confirmed solution, I didn't remove them, but moved them:

sudo mv /usr/share/applications/io.elementary.appcenter-daemon.desktop /usr/share/applications/io.elementary.appcenter-daemon.desktop.bk
sudo mv /etc/xdg/autostart/io.elementary.appcenter-daemon.desktop.bk backup/io.elementary.appcenter-daemon.desktop

So far, the freezes have been without consequences: I ran fsck and it came clean. If none of the above works to solve the problem, however, I will probably switch to Integrated Graphics instead of Hybrid Graphics. Please let me know if I can be of any more help.

EDIT: More info from last night's crash. This time, it's the logs from journalctl --boot -5. Note that there are many logs from Steam that I'm skipping. I'm breaking the log down with my comments. Is there something that seems likely to be the cuplrit?

Oct 08 20:34:27 pop-os steam.desktop[4780]: /data/src/common/enum_names.cpp (2194) : Assertion Failed: Missing String for EOSType (-185)
Oct 08 20:34:27 pop-os steam.desktop[4780]: [2020-10-08 20:34:27] uninstalled manifest found in /home/memonick/.local/share/Steam/package/steam_client_ubuntu12 (1).
Oct 08 20:34:27 pop-os steam.desktop[4780]: /data/src/common/enum_names.cpp (2194) : Assertion Failed: Missing String for EOSType (-185)

[ there are many errors like this, well before the crash, and after the restart, I'm not including all these logs for clarity ]

Oct 08 20:35:46 pop-os wpa_supplicant[833]: wlp0s20f3: WPA: Group rekeying completed with a6:91:b1:89:9c:b4 [GTK=TKIP]
Oct 08 20:36:19 pop-os kernel: wlp0s20f3: AP a6:91:b1:89:9c:b4 changed bandwidth, new config is 5200 MHz, width 1 (5200/0 MHz)
Oct 08 20:36:20 pop-os wpa_supplicant[833]: wlp0s20f3: CTRL-EVENT-BEACON-LOSS
Oct 08 20:36:20 pop-os geoclue[1264]: Failed to query location: Error resolving “location.services.mozilla.com”: Temporary failure in name resolution
Oct 08 20:36:20 pop-os wpa_supplicant[833]: wlp0s20f3: CTRL-EVENT-SIGNAL-CHANGE above=0 signal=0 noise=9999 txrate=0
Oct 08 20:36:20 pop-os wpa_supplicant[833]: wlp0s20f3: CTRL-EVENT-DISCONNECTED bssid=a6:91:b1:89:9c:b4 reason=4 locally_generated=1
Oct 08 20:36:20 pop-os NetworkManager[793]: <warn>  [1602182180.7699] sup-iface[0x55982a9f1900,wlp0s20f3]: connection disconnected (reason -4)
Oct 08 20:36:20 pop-os NetworkManager[793]: <info>  [1602182180.7752] device (wlp0s20f3): supplicant interface state: completed -> disconnected
Oct 08 20:36:20 pop-os NetworkManager[793]: <info>  [1602182180.7752] device (p2p-dev-wlp0s20f3): supplicant management interface state: completed -> disconnected
Oct 08 20:36:20 pop-os wpa_supplicant[833]: wlp0s20f3: Reject scan trigger since one is already pending
Oct 08 20:36:20 pop-os wpa_supplicant[833]: wlp0s20f3: Failed to initiate AP scan
Oct 08 20:36:21 pop-os wpa_supplicant[833]: wlp0s20f3: Reject scan trigger since one is already pending
Oct 08 20:36:21 pop-os wpa_supplicant[833]: wlp0s20f3: Failed to initiate AP scan
Oct 08 20:36:22 pop-os wpa_supplicant[833]: wlp0s20f3: Reject scan trigger since one is already pending
Oct 08 20:36:22 pop-os wpa_supplicant[833]: wlp0s20f3: Failed to initiate AP scan
Oct 08 20:36:23 pop-os wpa_supplicant[833]: wlp0s20f3: SME: Trying to authenticate with a4:91:b1:89:9c:ac (SSID='GOINTERNET-899CAC' freq=2437 MHz)
Oct 08 20:36:23 pop-os geoclue[1264]: Failed to query location: Error resolving “location.services.mozilla.com”: Temporary failure in name resolution
Oct 08 20:36:23 pop-os kernel: wlp0s20f3: authenticate with a4:91:b1:89:9c:ac
Oct 08 20:36:23 pop-os kernel: wlp0s20f3: send auth to a4:91:b1:89:9c:ac (try 1/3)
Oct 08 20:36:23 pop-os NetworkManager[793]: <info>  [1602182183.0705] device (wlp0s20f3): supplicant interface state: disconnected -> authenticating
Oct 08 20:36:23 pop-os NetworkManager[793]: <info>  [1602182183.0705] device (p2p-dev-wlp0s20f3): supplicant management interface state: disconnected -> authenticating
Oct 08 20:36:23 pop-os wpa_supplicant[833]: wlp0s20f3: Trying to associate with a4:91:b1:89:9c:ac (SSID='GOINTERNET-899CAC' freq=2437 MHz)
Oct 08 20:36:23 pop-os kernel: wlp0s20f3: authenticated
Oct 08 20:36:23 pop-os kernel: wlp0s20f3: associate with a4:91:b1:89:9c:ac (try 1/3)
Oct 08 20:36:23 pop-os NetworkManager[793]: <info>  [1602182183.0948] device (wlp0s20f3): supplicant interface state: authenticating -> associating
Oct 08 20:36:23 pop-os NetworkManager[793]: <info>  [1602182183.0949] device (p2p-dev-wlp0s20f3): supplicant management interface state: authenticating -> associating
Oct 08 20:36:23 pop-os kernel: wlp0s20f3: RX AssocResp from a4:91:b1:89:9c:ac (capab=0x1411 status=0 aid=9)
Oct 08 20:36:23 pop-os wpa_supplicant[833]: wlp0s20f3: Associated with a4:91:b1:89:9c:ac
Oct 08 20:36:23 pop-os wpa_supplicant[833]: wlp0s20f3: CTRL-EVENT-SUBNET-STATUS-UPDATE status=0
Oct 08 20:36:23 pop-os kernel: wlp0s20f3: associated
Oct 08 20:36:23 pop-os NetworkManager[793]: <info>  [1602182183.1099] device (wlp0s20f3): supplicant interface state: associating -> associated
Oct 08 20:36:23 pop-os NetworkManager[793]: <info>  [1602182183.1099] device (p2p-dev-wlp0s20f3): supplicant management interface state: associating -> associated
Oct 08 20:36:23 pop-os gnome-shell[1866]: An active wireless connection, in infrastructure mode, involves no access point?
Oct 08 20:36:24 pop-os NetworkManager[793]: <info>  [1602182184.1457] device (wlp0s20f3): supplicant interface state: associated -> 4-way handshake
Oct 08 20:36:24 pop-os NetworkManager[793]: <info>  [1602182184.1457] device (p2p-dev-wlp0s20f3): supplicant management interface state: associated -> 4-way handshake
Oct 08 20:36:24 pop-os wpa_supplicant[833]: wlp0s20f3: WPA: Key negotiation completed with a4:91:b1:89:9c:ac [PTK=CCMP GTK=TKIP]
Oct 08 20:36:24 pop-os wpa_supplicant[833]: wlp0s20f3: CTRL-EVENT-CONNECTED - Connection to a4:91:b1:89:9c:ac completed [id=0 id_str=]
Oct 08 20:36:24 pop-os wpa_supplicant[833]: wlp0s20f3: CTRL-EVENT-SIGNAL-CHANGE above=0 signal=-34 noise=9999 txrate=1000
Oct 08 20:36:24 pop-os NetworkManager[793]: <info>  [1602182184.1693] device (wlp0s20f3): supplicant interface state: 4-way handshake -> completed
Oct 08 20:36:24 pop-os NetworkManager[793]: <info>  [1602182184.1697] device (p2p-dev-wlp0s20f3): supplicant management interface state: 4-way handshake -> completed
Oct 08 20:36:26 pop-os wpa_supplicant[833]: wlp0s20f3: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-37 noise=9999 txrate=52000
Oct 08 20:36:54 pop-os kernel: wlp0s20f3: deauthenticated from a4:91:b1:89:9c:ac (Reason: 2=PREV_AUTH_NOT_VALID)
Oct 08 20:36:54 pop-os wpa_supplicant[833]: wlp0s20f3: CTRL-EVENT-SIGNAL-CHANGE above=0 signal=0 noise=9999 txrate=0
Oct 08 20:36:54 pop-os wpa_supplicant[833]: wlp0s20f3: CTRL-EVENT-DISCONNECTED bssid=a4:91:b1:89:9c:ac reason=2
Oct 08 20:36:54 pop-os NetworkManager[793]: <warn>  [1602182214.3426] sup-iface[0x55982a9f1900,wlp0s20f3]: connection disconnected (reason 2)
Oct 08 20:36:54 pop-os NetworkManager[793]: <info>  [1602182214.3478] device (wlp0s20f3): supplicant interface state: completed -> disconnected
Oct 08 20:36:54 pop-os NetworkManager[793]: <info>  [1602182214.3478] device (p2p-dev-wlp0s20f3): supplicant management interface state: completed -> disconnected
Oct 08 20:36:54 pop-os gnome-shell[1866]: An active wireless connection, in infrastructure mode, involves no access point?
Oct 08 20:36:54 pop-os wpa_supplicant[833]: wlp0s20f3: Reject scan trigger since one is already pending
Oct 08 20:36:54 pop-os wpa_supplicant[833]: wlp0s20f3: Failed to initiate AP scan
Oct 08 20:36:55 pop-os wpa_supplicant[833]: wlp0s20f3: Reject scan trigger since one is already pending
Oct 08 20:36:55 pop-os wpa_supplicant[833]: wlp0s20f3: Failed to initiate AP scan
Oct 08 20:36:56 pop-os wpa_supplicant[833]: wlp0s20f3: Reject scan trigger since one is already pending
Oct 08 20:36:56 pop-os wpa_supplicant[833]: wlp0s20f3: Failed to initiate AP scan
Oct 08 20:36:57 pop-os wpa_supplicant[833]: wlp0s20f3: SME: Trying to authenticate with a6:91:b1:89:9c:b4 (SSID='GOINTERNET-899CAC' freq=5200 MHz)
Oct 08 20:36:57 pop-os geoclue[1264]: Failed to query location: Error resolving “location.services.mozilla.com”: Temporary failure in name resolution
Oct 08 20:36:57 pop-os kernel: wlp0s20f3: authenticate with a6:91:b1:89:9c:b4
Oct 08 20:36:57 pop-os kernel: wlp0s20f3: send auth to a6:91:b1:89:9c:b4 (try 1/3)
Oct 08 20:36:57 pop-os NetworkManager[793]: <info>  [1602182217.1569] device (wlp0s20f3): supplicant interface state: disconnected -> authenticating
Oct 08 20:36:57 pop-os NetworkManager[793]: <info>  [1602182217.1569] device (p2p-dev-wlp0s20f3): supplicant management interface state: disconnected -> authenticating
Oct 08 20:36:57 pop-os wpa_supplicant[833]: wlp0s20f3: Trying to associate with a6:91:b1:89:9c:b4 (SSID='GOINTERNET-899CAC' freq=5200 MHz)
Oct 08 20:36:57 pop-os NetworkManager[793]: <info>  [1602182217.1759] device (wlp0s20f3): supplicant interface state: authenticating -> associating
Oct 08 20:36:57 pop-os NetworkManager[793]: <info>  [1602182217.1759] device (p2p-dev-wlp0s20f3): supplicant management interface state: authenticating -> associating
Oct 08 20:36:57 pop-os kernel: wlp0s20f3: authenticated
Oct 08 20:36:57 pop-os kernel: wlp0s20f3: associate with a6:91:b1:89:9c:b4 (try 1/3)
Oct 08 20:36:57 pop-os kernel: wlp0s20f3: RX AssocResp from a6:91:b1:89:9c:b4 (capab=0x1011 status=0 aid=3)
Oct 08 20:36:57 pop-os kernel: wlp0s20f3: associated
Oct 08 20:36:57 pop-os wpa_supplicant[833]: wlp0s20f3: Associated with a6:91:b1:89:9c:b4
Oct 08 20:36:57 pop-os wpa_supplicant[833]: wlp0s20f3: CTRL-EVENT-SUBNET-STATUS-UPDATE status=0
Oct 08 20:36:57 pop-os NetworkManager[793]: <info>  [1602182217.1883] device (wlp0s20f3): supplicant interface state: associating -> associated
Oct 08 20:36:57 pop-os NetworkManager[793]: <info>  [1602182217.1883] device (p2p-dev-wlp0s20f3): supplicant management interface state: associating -> associated
Oct 08 20:36:57 pop-os NetworkManager[793]: <info>  [1602182217.1965] device (wlp0s20f3): supplicant interface state: associated -> 4-way handshake
Oct 08 20:36:57 pop-os NetworkManager[793]: <info>  [1602182217.1966] device (p2p-dev-wlp0s20f3): supplicant management interface state: associated -> 4-way handshake
Oct 08 20:36:57 pop-os wpa_supplicant[833]: wlp0s20f3: WPA: Key negotiation completed with a6:91:b1:89:9c:b4 [PTK=CCMP GTK=TKIP]
Oct 08 20:36:57 pop-os wpa_supplicant[833]: wlp0s20f3: CTRL-EVENT-CONNECTED - Connection to a6:91:b1:89:9c:b4 completed [id=0 id_str=]
Oct 08 20:36:57 pop-os wpa_supplicant[833]: wlp0s20f3: CTRL-EVENT-SIGNAL-CHANGE above=0 signal=-33 noise=9999 txrate=6000
Oct 08 20:36:57 pop-os NetworkManager[793]: <info>  [1602182217.2082] device (wlp0s20f3): supplicant interface state: 4-way handshake -> completed
Oct 08 20:36:57 pop-os NetworkManager[793]: <info>  [1602182217.2087] device (p2p-dev-wlp0s20f3): supplicant management interface state: 4-way handshake -> completed
Oct 08 20:36:57 pop-os wpa_supplicant[833]: wlp0s20f3: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-33 noise=9999 txrate=6000
Oct 08 20:36:57 pop-os kernel: wlp0s20f3: Limiting TX power to 23 (23 - 0) dBm as advertised by a6:91:b1:89:9c:b4

[ this is the moment that the laptop froze ]

Oct 08 20:39:27 pop-os steam.desktop[4780]: /data/src/common/enum_names.cpp (2194) : Assertion Failed: Missing String for EOSType (-185)
Oct 08 20:39:27 pop-os steam.desktop[4780]: /data/src/common/enum_names.cpp (2194) : Assertion Failed: Missing String for EOSType (-185)

[ the laptop is still frozen and unresponsive at this point, and there are more errors from steam that I'm skipping ]

Oct 08 20:40:39 pop-os gnome-keyring-daemon[1618]: asked to register item /org/freedesktop/secrets/collection/login/18, but it's already registered
Oct 08 20:40:40 pop-os gnome-keyring-daemon[1618]: asked to register item /org/freedesktop/secrets/collection/login/19, but it's already registered

[ I restarted in-between this gap (I waited 5 minutes, so from 20:37 to 20:42) ]

Oct 08 20:44:27 pop-os steam.desktop[4780]: /data/src/common/enum_names.cpp (2194) : Assertion Failed: Missing String for EOSType (-185)
Oct 08 20:44:27 pop-os steam.desktop[4780]: /data/src/common/enum_names.cpp (2194) : Assertion Failed: Missing String for EOSType (-185)
Adam-Kadmon commented 3 years ago

Just to add to what @NicholasMamo has shared: I too have a laptop (Legion 5) with a GTX 1660 Ti (& i5-10300H CPU).

I rarely play games but I have never had a freeze happen under load - it's also 'at idle' (reading online or thinking what to type next in LibreOffice Writer, for example).

Something that jumped out at me in Nicholas' account is that I too most commonly experience freezes in the evenings, I'd say that around 90% of them have taken place 22:00-01:00. I typically work at my computer 09:00-01:00, with the last hour or so spent watching youtube or reading through forums looking for a fix to this problem. The remaining 10% have happening 0.5-1.5 hrs after booting. The only exception that comes to mind is one freeze that happened at 19:00, but that was one of the two I had with APST already disabled.

To update everyone on the appcenter daemons I tore out so hastily: no freezes since Tuesday, so I can't rule out removing those daemons (or possibly just killing the associated process) as a solution - which is good news, I think.

The fact that we're experiencing these crashes in the evenings suggests (to my untrained eye) a snow-balling process, like a RAM leak or something.

One last thing: I've never had a freeze while gaming (again, I rarely game, though) and I've never come back to a frozen system after having left it on with the screen off. I've been experimenting with avoiding suspending the computer entirely, but I haven't noted any correlation. I could suspend 5-6 times in a day or not even once and the freezes seemed to be as frequent.

NicholasMamo commented 3 years ago

Couple of things to add to @Adam-Kadmon's post: we have the same GPU and CPU, so it might be a problem with either or not at all (they are both pretty common). What I disagree with is the snowballing process, (but my eyes are probably even less trained than Adam's). The reason why I disagree is that even yesterday, I had switched off my laptop, and the crash happened a few hours later. Interesting that you mentioned watching YouTube though, most of my crashes happen around the time I'm watching or streaming videos.

I confirm everything else though: never freezes while gaming, and I've never come back to a frozen system. Avoiding suspending the laptop changes nothing for me.

Adam-Kadmon commented 3 years ago

Bad news. I just had another freeze today, 35 min after booting. This was the first in 4 days.

I had Stacer running in the background and Firefox open (on Reddit, no videos on screen). Nothing else.

The following things failed to recover the system from its locked-up state:

Did a hard power-off, waited 10 min, booted into Puppy Linux (Fossapup 9.5), ran fsck on ext4 partition.

Fsck returned no errors but suggested that some extent trees could be narrower, I agreed to optimise all (three) of these.

Powered down, waited 10 min and booted into Pop!_OS. Immediately applied this possible fix (from Reddit, from a post referred to in a post referred to by @NicholasMamo, above):

Commented out: options nvidia-drm modeset=1

In: /usr/lib/modprobe.d/nvidia-graphics-drivers.conf

Then ran: sudo update-initramfs -u

Approaches that have not worked thus far:

I have everything up to date, including Nvidia drivers (including, in turn, Nvidia flatpak packages).

A hypothesis to test by observation, @NicholasMamo if you could keep an eye on this too:

The last three freezes have taken place after booting after updates that required an initramfs update. Maybe this is related? I'm going to take note of when I perform such updates from now on - maybe there's correlation.

NicholasMamo commented 3 years ago

I don't see any recent initramfs updates in /var/log/apt/history.log, unless I'm misunderstanding what I should keep an eye on. I haven't had any freezes yet, but I only get them once every few days. While you experiment with the modeset, I'll switch to Integrated Graphics. I'm currently on Hybrid Graphics. None of the crashes were during gaming (which is the only time I am using the dedicated graphics card. However, if I get crashes with Integrated Graphics too, then the problem is unlikely to be related to the graphics card.

@Adam-Kadmon Do you see anything suspicious in journalctl? In my case, I noticed that I was getting logs while the laptop was frozen completely.

Adam-Kadmon commented 3 years ago

No, nothing suspicious (to my eye, which isn't saying much):

Oct 10 15:24:50 pop-os gnome-shell[1710]: ../clutter/clutter/clutter-actor.c:10558: The clutter_actor_set_allocation() function can only be called from within the implementation of the ClutterActor::allocate() virtual function. Oct 10 15:24:52 pop-os gnome-shell[1710]: Could not create transient scope for PID 3808: GDBus.Error:org.freedesktop.DBus.Error.UnixProcessIdUnknown: Process with ID 3808 does not exist. Oct 10 15:25:24 pop-os gnome-shell[1710]: Could not create transient scope for PID 3848: GDBus.Error:org.freedesktop.DBus.Error.UnixProcessIdUnknown: Process with ID 3848 does not exist. Oct 10 15:30:42 pop-os systemd[1]: Starting Cleanup of Temporary Directories... Oct 10 15:30:42 pop-os systemd[1]: systemd-tmpfiles-clean.service: Succeeded. Oct 10 15:30:42 pop-os systemd[1]: Finished Cleanup of Temporary Directories. Oct 10 15:42:42 pop-os systemd[1]: Starting Message of the Day... Oct 10 15:42:42 pop-os systemd[1]: motd-news.service: Succeeded. Oct 10 15:42:42 pop-os systemd[1]: Finished Message of the Day.

And also no logs once the laptop was frozen (which happened at 15:45:xx)

I get these errors at boot prior to the freeze, though:

CPI BIOS Error (bug): Could not resolve symbol [_SB.PCI0.I2C2.TPD0], AE_NOT_FOUND (20190816/dswload2-162) Oct 10 15:15:06 pop-os kernel: ACPI Error: AE_NOT_FOUND, During name lookup/catalog (20190816/psobject-220) Oct 10 15:15:06 pop-os kernel: ACPI: Skipping parse of AML opcode: Scope (0x0010) Oct 10 15:15:06 pop-os kernel: ACPI BIOS Error (bug): Could not resolve symbol [_SB.PCI0.I2C3.TPL1], AE_NOT_FOUND (20190816/dswload2-162) Oct 10 15:15:06 pop-os kernel: ACPI Error: AE_NOT_FOUND, During name lookup/catalog (20190816/psobject-220)

But I've seen them in logs when I haven't had the system freeze, too.

Also, I've been in Nvidia graphics mode since reformatting and reinstalling Pop!_OS. I'd been in hybrid graphics mode before that (when the freezes started).

Adam-Kadmon commented 3 years ago

It happened again today. Is anyone looking into this ticket? I know the team is probably busy with Pop!_OS 20.10 but this issue mightn't be resolved by the OS upgrade.

Here are some journalctl excerpts from today's boot:

Oct 12 10:34:48 pop-os kernel: [Firmware Bug]: TSC ADJUST: CPU0: -2378803894 force to 0

Oct 12 10:34:48 pop-os kernel: No NUMA configuration found Oct 12 10:34:48 pop-os kernel: Faking a node at [mem 0x0000000000000000-0x000000045e7fffff] Oct 12 10:34:48 pop-os kernel: NODE_DATA(0) allocated [mem 0x45e7d3000-0x45e7fdfff]

Oct 12 10:34:48 pop-os kernel: No NUMA configuration found Oct 12 10:34:48 pop-os kernel: Faking a node at [mem 0x0000000000000000-0x000000045e7fffff] Oct 12 10:34:48 pop-os kernel: NODE_DATA(0) allocated [mem 0x45e7d3000-0x45e7fdfff]

Oct 12 10:34:48 pop-os kernel: ACPI BIOS Error (bug): Could not resolve symbol [_SB.PCI0.I2C2.TPD0], AE_NOT_FOUND (20190816/dswload2-162) Oct 12 10:34:48 pop-os kernel: ACPI Error: AE_NOT_FOUND, During name lookup/catalog (20190816/psobject-220) Oct 12 10:34:48 pop-os kernel: ACPI: Skipping parse of AML opcode: Scope (0x0010) Oct 12 10:34:48 pop-os kernel: ACPI BIOS Error (bug): Could not resolve symbol [_SB.PCI0.I2C3.TPL1], AE_NOT_FOUND (20190816/dswload2-162) Oct 12 10:34:48 pop-os kernel: ACPI Error: AE_NOT_FOUND, During name lookup/catalog (20190816/psobject-220) Oct 12 10:34:48 pop-os kernel: ACPI: Skipping parse of AML opcode: Scope (0x0010)

Oct 12 10:34:48 pop-os kernel: nvme nvme0: missing or invalid SUBNQN field. Oct 12 10:34:48 pop-os kernel: nvme nvme1: missing or invalid SUBNQN field.

Oct 12 10:34:48 pop-os kernel: system76_acpi: loading out-of-tree module taints kernel. Oct 12 10:34:48 pop-os kernel: system76_acpi: module verification failed: signature and/or required key missing - tainting kernel Oct 12 10:34:48 pop-os kernel: system76: Model does not utilize this driver

Here are the last logs before the freeze happened at 11:42:xx:

Oct 12 11:34:26 pop-os gnome-shell[1697]: ../clutter/clutter/clutter-actor.c:10558: The clutter_actor_set_allocation() function can only be called from within the implementation of the ClutterActor::allocate() virtual function. Oct 12 11:35:46 pop-os gnome-shell[1697]: Could not create transient scope for PID 7968: GDBus.Error:org.freedesktop.DBus.Error.UnixProcessIdUnknown: Process with ID 7968 does not exist.

Applications running at time of crash:

Things that failed to recover the system from the freeze:

Fixes to this problem that have been confirmed not to work:

Recent hypothesis disproven:

Please take a look at this, devs! This is a serious problem that has already led to total file system corruption once.

Adam-Kadmon commented 3 years ago

One more thing: the last two times I experienced a freeze and at least twice before that, I noticed my monitor not turning off like it should (it's set to turn off after two minutes). I hadn't paid much attention to it since my mouse can sometimes 'wander' after I've got up from my desk as it sits on a slight incline. But now that I think of it, I'm 80% certain that this is a predictor for a freeze.

NicholasMamo commented 3 years ago

Quick update: I haven't had any problems since October 8, but I usually get freezes around once every 4 days. Since then, I have:

According to Wikipedia, "The magic SysRq key [including OOM kill and REISUB, then] cannot work under certain conditions, such as a kernel panic[2] or a hardware failure preventing the kernel from running properly." Would a kernel panic appear in the logs?

What I don't get is this:

Adam-Kadmon commented 3 years ago

It happened again today - is anyone monitoring this thread or working on this problem? It's an honest question, I'm not being passive-aggressive.

Excerpts from boot log:

Oct 13 09:16:35 pop-os kernel: [Firmware Bug]: TSC ADJUST: CPU0: -2135954629 force to 0

Oct 13 09:16:35 pop-os kernel: Faking a node at [mem 0x0000000000000000-0x000000045e7fffff] Oct 13 09:16:35 pop-os kernel: NODE_DATA(0) allocated [mem 0x45e7d5000-0x45e7fffff]

Oct 13 09:16:35 pop-os kernel: Zeroed struct page in unavailable ranges: 33844 pages

Oct 13 09:16:35 pop-os kernel: [Firmware Bug]: TSC ADJUST differs within socket( s), fixing all errors

Oct 13 09:16:35 pop-os kernel: ACPI BIOS Error (bug): Could not resolve symbol [_SB.PCI0.I2C2.TPD0], AE_NOT_FOUND (20190816/dswload2-162) Oct 13 09:16:35 pop-os kernel: ACPI Error: AE_NOT_FOUND, During name lookup/catalog (20190816/psobject-220) Oct 13 09:16:35 pop-os kernel: ACPI: Skipping parse of AML opcode: Scope (0x0010) Oct 13 09:16:35 pop-os kernel: ACPI BIOS Error (bug): Could not resolve symbol [_SB.PCI0.I2C3.TPL1], AE_NOT_FOUND (20190816/dswload2-162) Oct 13 09:16:35 pop-os kernel: ACPI Error: AE_NOT_FOUND, During name lookup/catalog (20190816/psobject-220) Oct 13 09:16:35 pop-os kernel: ACPI: Skipping parse of AML opcode: Scope (0x0010)

ct 13 09:16:35 pop-os kernel: nvme nvme0: missing or invalid SUBNQN field. Oct 13 09:16:35 pop-os kernel: nvme nvme0: 8/0/0 default/read/poll queues Oct 13 09:16:35 pop-os kernel: nvme0n1: p1 p2 Oct 13 09:16:35 pop-os kernel: nvme nvme1: missing or invalid SUBNQN field. Oct 13 09:16:35 pop-os kernel: nvme nvme1: 8/0/0 default/read/poll queues Oct 13 09:16:35 pop-os kernel: nvme1n1: p1 p2 p3 p4 p5

I can't recover logs from just before the freeze, maybe I'm doing it wrong, I don't know. Not knowing what I'm looking for, I don't suppose it makes much difference. There hasn't been any logging at the time of any of the previous freezes.

None of the previously mentioned attempts at unfreezing the system worked.

Fixes to this problem that have been confirmed not to work:

This is beyond frustrating. Those of us on this ticket are the ones who for the most part haven't just given up. Most people with this problem have simply hopped to another distro and are out there spreading the word re: Pop!_OS and its tendency to freeze up the entire system.

EDIT: I had only Qutebrowser open at the time of the freeze

Adam-Kadmon commented 3 years ago

Here are my logs from before the last freeze. The freeze happened at 14:25:xx.

Oct 13 14:17:01 pop-os CRON[13898]: pam_unix(cron:session): session opened for user root by (uid=0) Oct 13 14:17:01 pop-os CRON[13899]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) Oct 13 14:17:01 pop-os CRON[13898]: pam_unix(cron:session): session closed for user root Oct 13 14:18:09 pop-os /usr/lib/gdm3/gdm-x-session[1547]: (II) event6 - Logitech USB Optical Mouse: SYN_DROPPED event - some input events have been lost. Oct 13 14:18:12 pop-os dbus-daemon[708]: [system] Activating via systemd: service name='net.reactivated.Fprint' unit='fprintd.service' requested by ':1.87' (uid=1000 pid=1692 comm="/usr/bin/ gnome-shell " label="unconfined") Oct 13 14:18:12 pop-os systemd[1]: Starting Fingerprint Authentication Daemon... Oct 13 14:18:12 pop-os dbus-daemon[708]: [system] Successfully activated service 'net.reactivated.Fprint' Oct 13 14:18:12 pop-os systemd[1]: Started Fingerprint Authentication Daemon. Oct 13 14:18:12 pop-os gdm-password][13930]: pam_unix(gdm-password:auth): Couldn't open /etc/securetty: No such file or directory Oct 13 14:18:15 pop-os gdm-password][13930]: pam_unix(gdm-password:auth): Couldn't open /etc/securetty: No such file or directory Oct 13 14:18:15 pop-os gdm-password][13930]: gkr-pam: unlocked login keyring Oct 13 14:18:15 pop-os NetworkManager[712]: [1602591495.7387] agent-manager: agent[32ef6a2be8438531,:1.87/org.gnome.Shell.NetworkAgent/1000]: agent registered Oct 13 14:18:15 pop-os dbus-daemon[1449]: [session uid=1000 pid=1449] Activating service name='org.gnome.Nautilus' requested by ':1.46' (uid=1000 pid=1692 comm="/usr/bin/gnome-shell " label= "unconfined") Oct 13 14:18:15 pop-os dbus-daemon[1449]: [session uid=1000 pid=1449] Activating service name='org.freedesktop.FileManager1' requested by ':1.46' (uid=1000 pid=1692 comm="/usr/bin/gnome-shel l " label="unconfined") Oct 13 14:18:15 pop-os system76-power[726]: [INFO] DBUS Received GetSwitchable method Oct 13 14:18:15 pop-os system76-power[726]: [INFO] DBUS Received GetGraphics method Oct 13 14:18:15 pop-os system76-power[726]: [INFO] DBUS Received GetProfile method Oct 13 14:18:15 pop-os gnome-shell[1692]: gnome-shell-extension-system76-power: power profile was set: 'Balanced' Oct 13 14:18:15 pop-os gnome-shell[1692]: loading user theme: /usr/share/themes/CustomTopPanel/gnome-shell/gnome-shell.css Oct 13 14:18:15 pop-os dbus-daemon[1449]: [session uid=1000 pid=1449] Successfully activated service 'org.gnome.Nautilus' Oct 13 14:18:15 pop-os org.freedesktop.FileManager1[13938]: Failed to register: Unable to acquire bus name 'org.gnome.Nautilus' Oct 13 14:18:15 pop-os dbus-daemon[1449]: [session uid=1000 pid=1449] Activated service 'org.freedesktop.FileManager1' failed: Process org.freedesktop.FileManager1 exited with status 1 Oct 13 14:18:15 pop-os gnome-shell[1692]: Error connecting to Nautilus Oct 13 14:18:43 pop-os systemd[1]: fprintd.service: Succeeded. Oct 13 14:19:55 pop-os gnome-shell[1692]: Could not create transient scope for PID 14043: GDBus.Error:org.freedesktop.DBus.Error.UnixProcessIdUnknown: Process with ID 14043 does not exist. Oct 13 14:20:15 pop-os gnome-shell[1692]: Could not create transient scope for PID 14121: GDBus.Error:org.freedesktop.DBus.Error.UnixProcessIdUnknown: Process with ID 14121 does not exist.

Adam-Kadmon commented 3 years ago

DEVS, this clearly isn't a priority for you, but any advice you could offer would be greatly appreciated. Pop!_OS CAUSES IRRECOVERABLE, DAILY SYSTEM FREEZES. I just finished recovering from yet another freeze.

Here are my logs from before today's freeze. I'm tired of trying to parse these logs with absolutely no specialist knowledge. If a member of the Pop!_OS team could take a look, then I'd be grateful. I'm also tired of sugar-coating this serious problem that Pop! has and is apparently doing absolutely nothing to address. There is no way I can recommend Pop!_OS in good faith. It causes frequent, random, totally irrecoverable freezes that eventually lead to data loss. Productivity takes a nose-dive as your days get broken up into recovery cycles after these freezes. There is zero support from the developers on this issue.

Oct 14 10:48:50 pop-os systemd[1]: Starting Cleanup of Temporary Directories... Oct 14 10:48:50 pop-os systemd[1]: systemd-tmpfiles-clean.service: Succeeded. Oct 14 10:48:50 pop-os systemd[1]: Finished Cleanup of Temporary Directories. Oct 14 10:48:51 pop-os gnome-shell[2106]: 10:48:51 INFO: Request to www.googletagservices.com blocked by host blocker. Oct 14 10:48:51 pop-os gnome-shell[2106]: 10:48:51 INFO: Request to c.amazon-adsystem.com blocked by host blocker. Oct 14 10:48:51 pop-os gnome-shell[2106]: 10:48:51 INFO: Request to www.googletagservices.com blocked by host blocker. Oct 14 10:48:51 pop-os gnome-shell[2106]: 10:48:51 INFO: Request to c.amazon-adsystem.com blocked by host blocker. Oct 14 10:48:51 pop-os gnome-shell[2106]: 10:48:51 INFO: Request to www.googleta gmanager.com blocked by host blocker. Oct 14 10:48:51 pop-os gnome-shell[2106]: 10:48:51 INFO: Request to www.googletagmanager.com blocked by host blocker. Oct 14 10:49:20 pop-os gnome-shell[2106]: 10:49:20 INFO: Request to www.googletagservices.com blocked by host blocker. Oct 14 10:49:20 pop-os gnome-shell[2106]: 10:49:20 INFO: Request to c.amazon-adsystem.com blocked by host blocker. Oct 14 10:49:21 pop-os gnome-shell[2106]: 10:49:21 INFO: Request to www.googletagmanager.com blocked by host blocker. Oct 14 10:50:52 pop-os gnome-shell[2187]: libpng warning: iCCP: known incorrect sRGB profile Oct 14 10:50:52 pop-os gnome-shell[2187]: libpng warning: iCCP: known incorrect sRGB profile Oct 14 10:50:52 pop-os gnome-shell[2187]: libpng warning: iCCP: known incorrect sRGB profile Oct 14 10:50:52 pop-os gnome-shell[2187]: libpng warning: iCCP: known incorrect sRGB profile Oct 14 10:53:26 pop-os systemd[1425]: Started Application launched by gsd-media-keys. Oct 14 10:53:26 pop-os dbus-daemon[1445]: [session uid=1000 pid=1445] Activating via systemd: service name='org.gnome.Terminal' unit='gnome-terminal-server.service' requested by ':1.122' (uid=1000 pid=2519 comm="/usr/bin/gnome-terminal.real " label="unconfined") Oct 14 10:53:26 pop-os systemd[1425]: Starting GNOME Terminal Server... Oct 14 10:53:27 pop-os dbus-daemon[1445]: [session uid=1000 pid=1445] Successfully activated service 'org.gnome.Terminal' Oct 14 10:53:27 pop-os systemd[1425]: Started GNOME Terminal Server. Oct 14 10:53:27 pop-os systemd[1425]: Started VTE child process 2530 launched by gnome-terminal-server process 2524. Oct 14 10:53:27 pop-os systemd[1425]: gnome-launched-x-terminal-emulator-2516.scope: Succeeded. Oct 14 10:53:33 pop-os sudo[2537]: pam_unix(sudo:auth): Couldn't open /etc/securetty: No such file or directory Oct 14 10:53:36 pop-os sudo[2537]: pam_unix(sudo:auth): Couldn't open /etc/securetty: No such file or directory Oct 14 10:53:36 pop-os sudo[2537]: ach : TTY=pts/0 ; PWD=/home/ach ; USER=root ; COMMAND=/usr/bin/apt update Oct 14 10:53:36 pop-os sudo[2537]: pam_unix(sudo:session): session opened for user root by (uid=0) Oct 14 10:53:36 pop-os systemd-resolved[618]: Server returned error NXDOMAIN, mitigating potential DNS violation DVE-2018-0001, retrying transaction with reduced feature level UDP. Oct 14 10:53:36 pop-os systemd-resolved[618]: Server returned error NXDOMAIN, mitigating potential DNS violation DVE-2018-0001, retrying transaction with reduced feature level UDP. Oct 14 10:53:39 pop-os dbus-daemon[698]: [system] Activating via systemd: service name='org.freedesktop.PackageKit' unit='packagekit.service' requested by ':1.112' (uid=0 pid=3197 comm="/usr/bin/gdbus call --system --dest org.freedeskto" label="unconfined") Oct 14 10:53:39 pop-os systemd[1]: Starting PackageKit Daemon... Oct 14 10:53:39 pop-os PackageKit[3200]: daemon start Oct 14 10:53:39 pop-os dbus-daemon[698]: [system] Successfully activated service 'org.freedesktop.PackageKit' Oct 14 10:53:39 pop-os systemd[1]: Started PackageKit Daemon. Oct 14 10:53:45 pop-os sudo[2537]: pam_unix(sudo:session): session closed for user root Oct 14 10:53:54 pop-os polkitd(authority=local)[711]: Registered Authentication Agent for unix-process:3306:121324 (system bus name :1.114 [flatpak update], object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale en_GB.UTF-8) Oct 14 10:53:54 pop-os polkitd(authority=local)[711]: Unregistered Authentication Agent for unix-process:3306:121324 (system bus name :1.114, object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale en_GB.UTF-8) Oct 14 10:53:56 pop-os systemd[1425]: vte-spawn-86422368-f2f6-4fb8-8f10-ee10f7a4e502.scope: Succeeded. Oct 14 10:53:56 pop-os systemd[1425]: gnome-terminal-server.service: Succeeded.

I was running only Qutebrowser at the time of the freeze.

The following have been proven not to work:

This problem affects multiple users as evidenced by the first-hand reports and linked Reddit posts given above.

leviport commented 3 years ago

@Adam-Kadmon I'm sorry to hear you're dealing with this frustrating bug, and I assure you we have been following along. I just haven't seen anything so far that I could definitively point to as the issue. To me, it sounds very hardware-specific, and I don't have the same machine here to test with. The closest machine we have would be our Gazelle, which comes with a 1650, 1650Ti, or 1660Ti, and the Gazelle does not have any freezing problems. Unless I can reproduce the issue here, there isn't a whole lot I can do to help.

I'm assuming your machine has switchable graphics like the Gazelle does. Does the freeze happen in all graphics modes, or just one of them?


System76 customers can reach out to support for technical assistance. For non-System76 hardware, you can seek community support on Reddit or Mattermost.

Adam-Kadmon commented 3 years ago

@leviport thank you for posting. I apologise for the increasingly whingy tone of my posts. I'm sorry to say the frustration of all this was getting the better of me. I'm not even a System76 hardware owner and I do realise the Pop!_OS devs owe me nothing, to say the least.

I haven't even been able to reproduce the issue on my own effected laptop, so I can definitely hear what you're saying. To answer your question, I use an external monitor and so I've only been able to test Nvidia graphics mode. I recently tried Hybrid mode but it slowed my laptop to an absolute crawl on reboot (mouse cursor movements were, weirdly enough, uneffected).

I know I should try ditching the monitor and switching to Integrated mode but I spend 15+ hours a day working at my computer and that would be too uncomfortable given my desk situation. That probably sounds precious on my part, but my desk setup is pretty makeshift ATM.

Earlier today I upgraded to kernel 5.8.14. I experienced a pretty bad crash not longer after, but it was a crash rather than a freeze, and the magic SysReq keys allowed me to crash-land the laptop somewhat gracefully after everything else had failed. When it freezes this isn't possible, so I'm thinking it was something unrelated.

I'll keep trying what I can and report back here on the regular, minus the unhelpful and frankly embarassing attitude.

NicholasMamo commented 3 years ago

Earlier today I upgraded to kernel 5.8.14. I experienced a pretty bad crash not longer after, but it was a crash rather than a freeze, and the magic SysReq keys allowed me to crash-land the laptop somewhat gracefully after everything else had failed. When it freezes this isn't possible, so I'm thinking it was something unrelated.

Do the logs show anything at the time of the crash, or is the behavior the same as before (no errors in journalctl)?

Adam-Kadmon commented 3 years ago

@NicholasMamo, honestly, I didn't even note what time the crash occured. I know that's really slack of me, but it happened when I tried to launch a game for my wife to play while I borrowed her laptop to keep working.

The game was Ori and the Blind Forest, running through Proton 5.09 on Steam. It got stuck on a black screen with the title music playing on a loop. I could switch between tty's but they all had variations of the same theme playing: nvme0 is read-only, failed to write to nvme0, nvme0 not ready - restart aborted, and so on. I'd left APST enabled with the new kernel. I've since disabled it again just in case.

I've also since launched the same game multiple times without incident, including just now. I regret not looking through the logs or noting the time, but I was furious and in a huge rush to meet deadline at the time. After trying everything I could in all available tty's, I ran through REISUB on two of them. Doing so interrupted the flow of error messages to give me confirmation that each step had been completed. Neither tty responded to the B in REISUB, though, so I used O to shutdown instead. Running fsck showed no errors and Pop booted like normal afterwards.

I should have captured logs, but I really don't think it's related (it seems like the kind of thing switching kernels in between launches of a game could cause). If it was in fact related to the freezes, then it'll happen again and I'll react better then. Or throw my laptop out the window. I'm leaving both options open for now.

NicholasMamo commented 3 years ago

Unfortunately, after almost an entire week, the laptop froze again, 15 minutes after logging in. All I had open was Chrome with 3 tabs. To make matters worse, I cannot boot into Pop!_OS (when I select it in the boot menu, it just freezes again). fsck from live boot showed some errors, but only on the Windows reserved partition. Otherwise, it did not show any errors, but very little output, which I thought was weird.

I'll write in more detail later, but it seems the problems are unrelated to the GPU (I got freezes under integrated, hybrid and Nvidia), and unrelated to the C-states. Maybe it's related to the kernel, but who knows? In the meantime I've decided to try and switch to Ubuntu (and updating to kernel 5.8) seeing as these freezes are messing up my laptop so much, even if they are not frequent, for now. Fingers crossed I can install Ubuntu without any hitches.

EDIT: In the meantime, I've managed to log in to Pop!_OS. journalctl again shows no errors. REISUB, OOM did not work at the time of the crash. In the meantime, I am uploading ALL the logs that I've had since I bought this laptop hoping it helps anyone, possibly the devs. The last crash was on October 15 at around 11:51AM: no journalctl logs for around 3 minutes.

15102020.zip

NicholasMamo commented 3 years ago

I've successfully installed Ubuntu and updated to kernel 5.9. That should tell us if the problem is in Pop, or in the kernel or Ubuntu. This comment is mainly meant to recap the problems and what we've tried so far.

The problem is random freezes when the laptop isn't under intensive use (although I don't use it intensely often). journalctl does not show any errors at all, and the magic SysRq keys (REISUB, OOM kill) do not work at all: only a hard reset does it. Do that enough times, and the filesystem/boot loader starts to mess up.

The following are the things @Adam-Kadmon and I tried and which did not work:

leviport commented 3 years ago

To make matters worse, I cannot boot into Pop!_OS (when I select it in the boot menu, it just freezes again). fsck from live boot showed some errors, but only on the Windows reserved partition.

It might not be a bad idea to run some hardware diagnostics either. Freezes from simply selecting something in the BIOS boot menu sounds a little suspicious to me.

NicholasMamo commented 3 years ago

It might not be a bad idea to run some hardware diagnostics either. Freezes from simply selecting something in the BIOS boot menu sounds a little suspicious to me.

I think it was a problem with the filesystem. After failing to boot to Pop!_OS, I booted into Windows, which performed its own version of fsck, and then I used Pop!_OS from a Live USB. After that, I could log into my installed Pop!_OS without any problems.

EDIT: @leviport I have noticed I am getting these errors (both on Pop!_OS and Ubuntu):

kernel:  i915 0000:00:02.0: [drm] *ERROR* Atomic update failure on pipe A (start=254128 end=254129) time 291 us, min 1063, max 1079, scanline start 1033, end 1078

They happen semi-regularly without causing any issues, but could they be symptomatic of something worse, possibly the same thing causing the freezes?

Oymate commented 3 years ago

@NicholasMamo any update?

NicholasMamo commented 3 years ago

@NicholasMamo any update?

No freezes just yet, but it's only been 3 days, and sometimes freezes take longer (last time it took 6 days). Changes since October 14:

If you still have issues, I suggest you upgrade your kernel first. I also suggest you type in: grep -i atomic /var/log/kern.log | less +G and look for the same error I posted before. I suspect that may have had something to do with the freezes, but I haven't found anyone with the same error yet.

Adam-Kadmon commented 3 years ago

Another thread on Reddit, more accounts of similar freezes in the comments. Leaving it here in case it comes in handy at some point down the line.

https://www.reddit.com/r/pop_os/comments/jchvld/complete_freeze/?utm_source=share&utm_medium=web2x&context=3

bartlebee13 commented 3 years ago

I'm on a new System76 Oryx Pro that I bought last week. Its got an NVMe drive and NVIDIA RTX 2060. My laptop froze 2x today during the workday with just vscode, chrome, postman, and a couple terminal windows open. This is unbelievably frustrating.

But at least we know its definitely affecting System76 hardware, so that's good.

drorm commented 3 years ago

System 76 Darter Pro, with no GPU. Purchased in June 2020. Uptime was great until a couple of weeks ago as you can see from the last command. Obviously the "still running" is bogus except for the first one, probably because the system froze rather than being shut down properly. So I would guess a system update shortly before triggered it, though looking at the date this ticket was open it could have been earlier, and I only ran into it when my system crashed, possibly for other reasons on October 8 and rebooted with the updated Kernel.


last reboot 
reboot   system boot  5.4.0-7626-gener Wed Oct 21 17:44   still running
reboot   system boot  5.4.0-7626-gener Tue Oct 20 23:44   still running
reboot   system boot  5.4.0-7626-gener Tue Oct 20 12:19   still running
reboot   system boot  5.4.0-7626-gener Mon Oct 19 23:24   still running
reboot   system boot  5.4.0-7626-gener Sun Oct 18 17:01   still running
reboot   system boot  5.4.0-7626-gener Sat Oct 17 10:29   still running
reboot   system boot  5.4.0-7626-gener Thu Oct 15 19:48   still running
reboot   system boot  5.4.0-7626-gener Thu Oct 15 19:45 - 19:47  (00:02)
reboot   system boot  5.4.0-7626-gener Thu Oct 15 08:20 - 19:47  (11:27)
reboot   system boot  5.4.0-7626-gener Fri Oct  9 10:17 - 19:47 (6+09:30)
reboot   system boot  5.4.0-7626-gener Thu Oct  8 09:22 - 19:47 (7+10:25)
reboot   system boot  5.4.0-7626-gener Thu Sep 10 15:59 - 19:47 (35+03:48)
reboot   system boot  5.4.0-7626-gener Thu Aug 20 09:57 - 19:47 (56+09:50)
reboot   system boot  5.4.0-7626-gener Sun Jun 21 13:40 - 19:47 (116+06:06)
reboot   system boot  5.4.0-7626-gener Sun Jun 14 10:06 - 13:40 (7+03:34)
reboot   system boot  5.4.0-7626-gener Sat Jun 13 01:20 - 10:05 (1+08:45)

wtmp begins Sat Jun 13 01:20:03 2020
NicholasMamo commented 3 years ago

I've found another Reddit thread, which is on System76 hardware again: https://www.reddit.com/r/System76/comments/jfsmps/oryx_pro_oryp6_and_freezesfreezing_on_system76/

I'm still on Ubuntu and it's now been a week without freezes (touch wood). However, seeing as most of the issues are on new hardware (and I didn't just install Ubuntu, but upgraded to kernel 5.9), I think it might be some hardware incompatibility that is solved by a newer kernel.

Debugging the problem

Just because you are getting freezes, it does not mean you have the same problems as everyone else. The first thing you should do is share your specs and check the logs.

If you know when the freeze happened, look for the boot when it happened using journalctl --boot -x (replace x) and scroll down to the time of the freeze. If journalctl shows an error at the time of the freeze, it's possible that the problem isn't the same as the others in this thread. The good news, in that case, is that it may be easier to fix.

If there are no visible errors, then it might be the same problem we are experiencing. Another common behavior that we've observed is that when a freeze happens, REISUB doesn't work (if you don't know what that is, you can read more about it here).

Possible solutions

As I said above, the fact that the problems are common on new hardware means that it might be a kernel problem. In that case, updating the kernel to a recent one (the most recent is 5.9) might help. I have had success so far although it's too soon to cry victory. You can find tutorials online.

ronaaron commented 3 years ago

Yep. My problem has been occurring on an oryxp6, which I got about a month ago. Using "integrated" video mode seems fine, but "hybrid" dies about once a day. There is nothing incriminating in any of the logs as far as I can tell (I've been using Linux for 20+ years, so I know my way around the system in general).

Uname -r gives : 5.4.0-7642-generic

And indeed, REISUB doesn't work. I also cannot ssh into the machine once it hangs, though the screen still displays whatever was going on when it froze

NicholasMamo commented 3 years ago

Yep. My problem has been occurring on an oryxp6, which I got about a month ago. Using "integrated" video mode seems fine, but "hybrid" dies about once a day. There is nothing incriminating in any of the logs as far as I can tell (I've been using Linux for 20+ years, so I know my way around the system in general).

Good for you, I got problems on integrated too :P I suggest that you update the kernel (or wait until someone more qualified chimes in). What are its specs? (I'm mostly interested in the GPU and CPU.)

ronaaron commented 3 years ago

CPU: Intel(R) Core(TM) i7-10875H CPU @ 2.30GHz cores=8 enabledcores=8 GPU: NVIDIA GeForce RTX 2060 also has Intel UHD graphics controller (the integrated one).

From what I'm seeing discussed, it sounds like some NVIDIA driver issue. I don't know if a kernel beyond 5.4 is available in the Pop!_OS repos, and I'm not too excited about upgrading a kernel that isn't specifically supported...

NicholasMamo commented 3 years ago

You will get an updated Pop!_OS kernel with the release of 20.10, I imagine, so might as well wait a bit. For what it's worth, though, I am using nvidia-driver-450 on my Ubuntu (but I have a 1660Ti). Maybe try to switch to a different driver if you believe your problem is a GPU issue?

ronaaron commented 3 years ago

Ah, probably true. Hopefully a System76 expert will chime here...

ozdreamern commented 3 years ago

I also have this problem on my System76 Lemur Pro (lemp9), which has only Intel integrated graphics. It freezes roughly once a week, while working in Firefox (usually a Google doc of some type). The display remains on but the laptop is utterly frozen, nonresponsive to any keyboard activity (including magic SysRq strings). Nothing illuminating is logged at the time of the crash or just before.

KaNuckles commented 3 years ago

Same problem here on my xps 9500

I also updated my kernel to 5.9.1. Still waiting to see if it freezes again...

bartlebee13 commented 3 years ago

Update: System76 Oryx Pro - I just left my laptop running with a couple browser tabs, vscode, terminal, and dbeaver open and stepped away to eat. I came back after maybe 20 minutes and found my laptop frozen again. 3rd time it's happened in 24 hours on a less than week old machine. 🤯😂

I'm fairly new to Linux - are there any commands I can run or utilities I can check to see the damage that these freezes and forced shutdowns are doing to my file system/hard drive?