pop-os / pop

A project for managing all Pop!_OS sources
https://system76.com/pop
2.47k stars 87 forks source link

Screen blinks after 3D app close #1231

Open Yuri6037 opened 4 years ago

Yuri6037 commented 4 years ago

Distribution (run cat /etc/os-release):

NAME="Pop!_OS"
VERSION="20.04 LTS"
ID=pop
ID_LIKE="ubuntu debian"
PRETTY_NAME="Pop!_OS 20.04 LTS"
VERSION_ID="20.04"
HOME_URL="https://pop.system76.com"
SUPPORT_URL="https://support.system76.com"
BUG_REPORT_URL="https://github.com/pop-os/pop/issues"
PRIVACY_POLICY_URL="https://system76.com/privacy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
LOGO=distributor-logo-pop-os

Related Application and/or Package Version (run apt policy $PACKAGE NAME):

nvidia-driver-440:
  Installed: 440.100-1pop2~1596832961~20.04~c5e0bee
  Candidate: 440.100-1pop2~1596832961~20.04~c5e0bee
  Version table:
 *** 440.100-1pop2~1596832961~20.04~c5e0bee 1001
       1001 http://ppa.launchpad.net/system76/pop/ubuntu focal/main amd64 Packages
        100 /var/lib/dpkg/status
     440.100-0ubuntu0.20.04.1 500
        500 http://us.archive.ubuntu.com/ubuntu focal-security/restricted amd64 Packages
        500 http://us.archive.ubuntu.com/ubuntu focal-updates/restricted amd64 Packages
     440.82+really.440.64-0ubuntu6 500
        500 http://us.archive.ubuntu.com/ubuntu focal/restricted amd64 Packages

Issue/Bug Description: Screen blinks from color to black for a second and then comes back on after closing a 3D application that requires NVIDIA hardware acceleration. Does not matter if the application is windowed or not. Problem has been tested with:

Steps to reproduce (if you know):

Althrough this is not an immediate problem, this can cause a TERRIBLE issue in the future. The following happened twice since nvidia 440 install: after 1 to 2 hours of use after game close and screen blink, you have a chance that the computer will lock forever (no TTY, no input, no mouse and frozen video), seams like some kind of bug in the driver causing a hardware bug. When the computer is hardware locked the only way to get it to work again is to press the motherboard reset switch or shutdown the PSU. NOTE: The second time this happened I was editing a text file. strangely when I re-opened the computer the document was still there and the text I removed using backspace after computer lock was actually registered, could mean that it's not the entire computer that locks forever it's only the GPU which could explain why the picture was frozen, maybe the driver crashed and it caused the GPU to crash as well or maybe when the driver crashes X11 does not try to restart it. On Windows NVIDIA driver crash is recurrent on any of my NVIDIA computers: GTX 1060 and also when this computer was running Windows. But when this happens Windows restarts it and throws a message box on the screen to tell me the video driver crashed and was restarted.

Expected behavior: No screen blink and no lock forever after 2 hours.

Other Notes: Computer specs:

UPDATE: This just happened again and this time no game ran before. This time I tried to ping the locked computer from another Windows machine and it worked, meaning the underlying system is still responding. The lock issue is then entirely crappy NVIDIA coding that crashes randomly.

ids1024 commented 4 years ago

I don't seem to be seeing this in windowed mode, with Nvidia graphics. There is a black screen for a fraction of a second after closing on fullscreen OpenGL program.

Do you get this will all OpenGL software (for instance, something simple like glxgears)?

after 1 to 2 hours of use after game close and screen blink, you have a chance that the computer will lock forever (no TTY, no input, no mouse and frozen video), seams like some kind of bug in the driver causing a hardware bug.

While the brief blank screen sounds like a minor issue. This is worse.

In any case, it's presumably an issue in Nvidia's proprietary driver.

Yuri6037 commented 4 years ago

Indeed glxgears does not cause a screen blink. That bug is strange, it happens in Minecraft for sure (at least on my hardware) also happens with my custom 3D engine, not sure exactly what causes that; maybe is it SDL2, GLFW or LWJGL 2.

For the crash, it's kind of random. I have advanced a bit:

This makes me think there might be temperature problem (especially considering where the computer is, an well isolated attic with bad thermal transfer => hot air stays in much longer than in other rooms). I know the CPUs can crash on high temps (fortunatly I changed thermal paste recently). I also know that I never replaced thermal paste on the graphics card, so maybe is it running too hot and gives up...

Yuri6037 commented 4 years ago

The crash happened again today. However this time the fans were not running. Something is for sure this crash has something to do with 3D apps. It always occurs after the end of 3D application, maybe a hour or 2 even 3 sometimes and even does it multiple times after a 3D app run.

Yuri6037 commented 4 years ago

Hello, I'm just adding an update: I received 2 days ago an update to NVIDIA 450.66 from Pop Shop. I just installed it, will see if this fixes the problem. I also couldn't see any crash for about a week as the weather is getting colder. Could the ambient temperature be of any play here after all?

Yuri6037 commented 4 years ago

Hello, I just wanted to add that it just happened again.

It's truly NVIDIA or Xorg fault as sound still plays fine on the background even through image is completely frozen.

Is there any way you can implement a automatic Xorg server reboot if it crashes? Under Windows there's already a similar system when the graphics driver crashes.

Yuri6037 commented 3 years ago

Hello, again another crash! This time running PopOS 20.10:

NAME="Pop!_OS"
VERSION="20.10"
ID=pop
ID_LIKE="ubuntu debian"
PRETTY_NAME="Pop!_OS 20.10"
VERSION_ID="20.10"
HOME_URL="https://pop.system76.com"
SUPPORT_URL="https://support.system76.com"
BUG_REPORT_URL="https://github.com/pop-os/pop/issues"
PRIVACY_POLICY_URL="https://system76.com/privacy"
VERSION_CODENAME=groovy
UBUNTU_CODENAME=groovy
LOGO=distributor-logo-pop-os

If this continues, I may have to consider downgrading this computer back to Windows which I know was far more reliable when it comes to GPU/driver crashes. It still was crashing but much fewer times than what I observe here. It's like each month I get a new crash and that's already too much because at the end it might cause permanent damage and/or data loss over some of the HDDs...

EDIT: Here in case anyone wants to try debugging that I found a way to dump the dmesg of the boot that came to a lock-up right after openning the session: https://pastebin.com/FdmWdtRA

Yuri6037 commented 3 years ago

Hi today another crash has occured but something is strange about this one: all the system has frozen not just NVIDIA, network crashed as well. Motherboard reset switched worked as always. However some weird logs occured in journalctl -k -b -1:

Jan 13 15:09:05 yuri-ws kernel: scsi 9:0:0:0: CD-ROM            Linux    File-CD Gadget   0414 PQ: 0 ANSI: 2
Jan 13 15:09:05 yuri-ws kernel: sr 9:0:0:0: Power-on or device reset occurred
Jan 13 15:09:05 yuri-ws kernel: sr 9:0:0:0: [sr1] scsi-1 drive
Jan 13 15:09:05 yuri-ws kernel: sr 9:0:0:0: Attached scsi CD-ROM sr1
Jan 13 15:09:05 yuri-ws kernel: sr 9:0:0:0: Attached scsi generic sg8 type 5
Jan 13 15:32:42 yuri-ws kernel: ISO 9660 Extensions: Microsoft Joliet Level 1
Jan 13 15:32:42 yuri-ws kernel: ISOFS: changing to secondary root
Jan 13 16:34:58 yuri-ws kernel: usb 5-1: reset high-speed USB device number 2 using xhci_hcd

The last line about USB is right before the complete system freeze. Any way xhci_hcd is causing system lock-ups now?