pop-os / pop

A project for managing all Pop!_OS sources
https://system76.com/pop
2.44k stars 87 forks source link

Black flashes on screen after NVIDIA driver update #3088

Closed JanPokorny closed 10 months ago

JanPokorny commented 1 year ago

Distribution (run cat /etc/os-release):

NAME="Pop!_OS"
VERSION="22.04 LTS"
ID=pop
ID_LIKE="ubuntu debian"
PRETTY_NAME="Pop!_OS 22.04 LTS"
VERSION_ID="22.04"
HOME_URL="https://pop.system76.com"
SUPPORT_URL="https://support.system76.com"
BUG_REPORT_URL="https://github.com/pop-os/pop/issues"
PRIVACY_POLICY_URL="https://system76.com/privacy"
VERSION_CODENAME=jammy
UBUNTU_CODENAME=jammy
LOGO=distributor-logo-pop-os

Related Application and/or Package Version (run apt policy $PACKAGE NAME):

The issue started today. Yesterday, the following packages updated on my system, according to /var/log/dpkg.log:

2023-07-25 10:44:05 install libnvidia-extra-535:amd64 <none> 535.86.05-1pop0~1689705886~22.04~5d5580b
2023-07-25 10:44:05 install libnvidia-common-535:all <none> 535.86.05-1pop0~1689705886~22.04~5d5580b
2023-07-25 10:44:05 install libnvidia-gl-535:i386 <none> 535.86.05-1pop0~1689705886~22.04~5d5580b
2023-07-25 10:44:05 install libnvidia-gl-535:amd64 <none> 535.86.05-1pop0~1689705886~22.04~5d5580b
2023-07-25 10:44:14 install nvidia-kernel-source-535:amd64 <none> 535.86.05-1pop0~1689705886~22.04~5d5580b
2023-07-25 10:44:15 install nvidia-kernel-common-535:amd64 <none> 535.86.05-1pop0~1689705886~22.04~5d5580b
2023-07-25 10:44:15 install nvidia-firmware-535-535.86.05:amd64 <none> 535.86.05-1pop0~1689705886~22.04~5d5580b
2023-07-25 10:44:15 install nvidia-dkms-535:amd64 <none> 535.86.05-1pop0~1689705886~22.04~5d5580b
2023-07-25 10:44:15 install libnvidia-decode-535:amd64 <none> 535.86.05-1pop0~1689705886~22.04~5d5580b
2023-07-25 10:44:16 install libnvidia-compute-535:i386 <none> 535.86.05-1pop0~1689705886~22.04~5d5580b
2023-07-25 10:44:17 install libnvidia-decode-535:i386 <none> 535.86.05-1pop0~1689705886~22.04~5d5580b
2023-07-25 10:44:17 install libnvidia-compute-535:amd64 <none> 535.86.05-1pop0~1689705886~22.04~5d5580b
2023-07-25 10:44:18 install nvidia-compute-utils-535:amd64 <none> 535.86.05-1pop0~1689705886~22.04~5d5580b
2023-07-25 10:44:18 install libnvidia-encode-535:i386 <none> 535.86.05-1pop0~1689705886~22.04~5d5580b
2023-07-25 10:44:18 install libnvidia-encode-535:amd64 <none> 535.86.05-1pop0~1689705886~22.04~5d5580b
2023-07-25 10:44:18 install nvidia-utils-535:amd64 <none> 535.86.05-1pop0~1689705886~22.04~5d5580b
2023-07-25 10:44:18 install libnvidia-cfg1-535:amd64 <none> 535.86.05-1pop0~1689705886~22.04~5d5580b
2023-07-25 10:44:18 install xserver-xorg-video-nvidia-535:amd64 <none> 535.86.05-1pop0~1689705886~22.04~5d5580b
2023-07-25 10:44:19 install libnvidia-fbc1-535:i386 <none> 535.86.05-1pop0~1689705886~22.04~5d5580b
2023-07-25 10:44:20 install libnvidia-fbc1-535:amd64 <none> 535.86.05-1pop0~1689705886~22.04~5d5580b
2023-07-25 10:44:22 install nvidia-driver-535:amd64 <none> 535.86.05-1pop0~1689705886~22.04~5d5580b

Issue/Bug Description:

Black flashes covering approximately the top half of the screen, very short (~1 frame), happening every minute or so, irregularly. It does not seem to be correlated to any action, the flashes happen even on the login screen.

Steps to reproduce (if you know):

Seems to happen after updating to the driver versions listed above. Also note that I have a 32:9 screen (5120x1440 resolution), which might be relevant.

leviport commented 1 year ago

Please describe your hardware, including the refresh rate of that ultrawide display.

JanPokorny commented 1 year ago

In the meantime, I have attempted to fix by apt-get purge '*nvidia*'; apt-get install nvidia-driver-535-server; reboot since that one is a bit older version (535.54.03 vs 535.86.05), but it did not help. Thus, in the output below, it shows the 535.54.03 version.

My GPU is NVIDIA RTX 2070. nvidia-smi:

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 2070        Off | 00000000:26:00.0  On |                  N/A |
| 18%   42C    P5              34W / 175W |    525MiB /  8192MiB |      4%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

Monitor is Samsung CRG9, running at 5120x1440@120Hz. xrandr output:

Screen 0: minimum 8 x 8, current 5120 x 1440, maximum 32767 x 32767
DP-0 disconnected (normal left inverted right x axis y axis)
DP-1 disconnected (normal left inverted right x axis y axis)
HDMI-0 disconnected (normal left inverted right x axis y axis)
DP-2 disconnected (normal left inverted right x axis y axis)
DP-3 disconnected (normal left inverted right x axis y axis)
DP-4 connected primary 5120x1440+0+0 (normal left inverted right x axis y axis) 1mm x 1mm
   3840x1080    119.97 +  99.96    59.97
   5120x1440    119.97*  100.00    59.98
   2560x1440     59.95
   2560x1080    119.88   100.00    60.00    59.94
   1920x1080    119.88   100.00    60.00    59.94
   1680x1050     59.95
   1600x900      60.00
   1440x900      59.89
   1280x1024     75.02    60.02
   1280x800      59.81
   1280x720      60.00
   1152x864      75.00
   1024x768      75.03    70.07    60.00
   800x600       75.00    72.19    60.32    56.25
   640x480       75.00    72.81    59.94
DP-5 disconnected (normal left inverted right x axis y axis)
USB-C-0 disconnected (normal left inverted right x axis y axis)

Here's lshw -short:

H/W path                Device           Class          Description
===================================================================
                                         system         MS-7B86 (To be filled by O.E.M.)
/0                                       bus            B450 GAMING PLUS MAX (MS-7B86)
/0/0                                     memory         64KiB BIOS
/0/10                                    memory         16GiB System Memory
/0/10/0                                  memory         3200 MHz (0,3 ns) [empty]
/0/10/1                                  memory         8GiB DIMM DDR4 Synchronous Unbuffered (Unregistered) 3200 MHz (0,3 ns)
/0/10/2                                  memory         3200 MHz (0,3 ns) [empty]
/0/10/3                                  memory         8GiB DIMM DDR4 Synchronous Unbuffered (Unregistered) 3200 MHz (0,3 ns)
/0/13                                    memory         384KiB L1 cache
/0/14                                    memory         3MiB L2 cache
/0/15                                    memory         32MiB L3 cache
/0/16                                    processor      AMD Ryzen 5 3600 6-Core Processor
/0/100                                   bridge         Starship/Matisse Root Complex
/0/100/0.2                               generic        Starship/Matisse IOMMU
/0/100/1.1                               bridge         Starship/Matisse GPP Bridge
/0/100/1.1/0            /dev/nvme0       storage        KINGSTON SA2000M81000G
/0/100/1.1/0/0          hwmon0           disk           NVMe disk
/0/100/1.1/0/2          /dev/ng0n1       disk           NVMe disk
/0/100/1.1/0/1          /dev/nvme0n1     disk           1TB NVMe disk
/0/100/1.1/0/1/1        /dev/nvme0n1p1   volume         99MiB Windows FAT volume
/0/100/1.1/0/1/2        /dev/nvme0n1p2   volume         15MiB reserved partition
/0/100/1.1/0/1/3        /dev/nvme0n1p3   volume         930GiB Windows NTFS volume
/0/100/1.1/0/1/4        /dev/nvme0n1p4   volume         632MiB Windows NTFS volume
/0/100/1.3                               bridge         Starship/Matisse GPP Bridge
/0/100/1.3/0                             bus            400 Series Chipset USB 3.1 XHCI Controller
/0/100/1.3/0/0          usb2             bus            xHCI Host Controller
/0/100/1.3/0/0/3        input2           input          Yubico YubiKey OTP+FIDO+CCID
/0/100/1.3/0/0/8                         generic        802.11ac WLAN Adapter
/0/100/1.3/0/0/9        card3            multimedia     SteelSeries Arctis Nova 7
/0/100/1.3/0/1          usb6             bus            xHCI Host Controller
/0/100/1.3/0.1          scsi1            storage        400 Series Chipset SATA Controller
/0/100/1.3/0.1/0        /dev/sda         disk           512GB Apacer AS350 512
/0/100/1.3/0.1/0/1      /dev/sda1        volume         1021MiB Windows FAT volume
/0/100/1.3/0.1/0/2      /dev/sda2        volume         4095MiB Windows FAT volume
/0/100/1.3/0.1/0/3      /dev/sda3        volume         467GiB EFI partition
/0/100/1.3/0.1/0/4      /dev/sda4        volume         4095MiB Linux swap volume
/0/100/1.3/0.1/1        /dev/sdb         disk           3TB WDC WD30EFRX-68E
/0/100/1.3/0.1/1/1      /dev/sdb1        volume         2794GiB Windows NTFS volume
/0/100/1.3/0.1/0.0.0    /dev/sdc         disk           3TB WDC WD30EFRX-68E
/0/100/1.3/0.1/0.0.0/1  /dev/sdc1        volume         2794GiB Windows NTFS volume
/0/100/1.3/0.2                           bridge         400 Series Chipset PCIe Bridge
/0/100/1.3/0.2/0                         bridge         400 Series Chipset PCIe Port
/0/100/1.3/0.2/1                         bridge         400 Series Chipset PCIe Port
/0/100/1.3/0.2/1/0      enp34s0          network        RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
/0/100/1.3/0.2/4                         bridge         400 Series Chipset PCIe Port
/0/100/3.1                               bridge         Starship/Matisse GPP Bridge
/0/100/3.1/0                             display        TU106 [GeForce RTX 2070]
/0/100/3.1/0.1          card0            multimedia     TU106 High Definition Audio Controller
/0/100/3.1/0.1/0        input16          input          HDA NVidia HDMI/DP,pcm=3
/0/100/3.1/0.1/1        input17          input          HDA NVidia HDMI/DP,pcm=7
/0/100/3.1/0.1/2        input18          input          HDA NVidia HDMI/DP,pcm=8
/0/100/3.1/0.1/3        input19          input          HDA NVidia HDMI/DP,pcm=9
/0/100/3.1/0.2                           bus            TU106 USB 3.1 Host Controller
/0/100/3.1/0.2/0        usb1             bus            xHCI Host Controller
/0/100/3.1/0.2/1        usb4             bus            xHCI Host Controller
/0/100/3.1/0.3                           bus            TU106 USB Type-C UCSI Controller
/0/100/7.1                               bridge         Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B]
/0/100/7.1/0                             generic        Starship/Matisse PCIe Dummy Function
/0/100/8.1                               bridge         Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B]
/0/100/8.1/0                             generic        Starship/Matisse Reserved SPP
/0/100/8.1/0.1                           generic        Starship/Matisse Cryptographic Coprocessor PSPCPP
/0/100/8.1/0.3                           bus            Matisse USB 3.0 Host Controller
/0/100/8.1/0.3/0        usb3             bus            xHCI Host Controller
/0/100/8.1/0.3/0/1                       bus            4-Port USB 2.1 Hub
/0/100/8.1/0.3/0/1/1                     input          USB Receiver
/0/100/8.1/0.3/0/1/1/0  input15          input          Logitech ERGO K860
/0/100/8.1/0.3/0/1/3    card2            multimedia     HD720P Webcam: HD720P Webcam
/0/100/8.1/0.3/0/1/4    input8           input          USB Optical Mouse
/0/100/8.1/0.3/1        usb5             bus            xHCI Host Controller
/0/100/8.1/0.3/1/1                       bus            4-Port USB 3.1 Hub
/0/100/8.1/0.4          card1            multimedia     Starship/Matisse HD Audio Controller
/0/100/8.1/0.4/0        input21          input          HD-Audio Generic Front Mic
/0/100/8.1/0.4/1        input22          input          HD-Audio Generic Rear Mic
/0/100/8.1/0.4/2        input23          input          HD-Audio Generic Line
/0/100/8.1/0.4/3        input24          input          HD-Audio Generic Line Out Front
/0/100/8.1/0.4/4        input25          input          HD-Audio Generic Line Out Surround
/0/100/8.1/0.4/5        input26          input          HD-Audio Generic Line Out CLFE
/0/100/8.1/0.4/6        input27          input          HD-Audio Generic Line Out Side
/0/100/8.1/0.4/7        input28          input          HD-Audio Generic Front Headphone
/0/100/8.2                               bridge         Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B]
/0/100/8.2/0                             storage        FCH SATA Controller [AHCI mode]
/0/100/8.3                               bridge         Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B]
/0/100/8.3/0                             storage        FCH SATA Controller [AHCI mode]
/0/100/14                                bus            FCH SMBus Controller
/0/100/14.3                              bridge         FCH LPC Bridge
/0/100/14.3/0                            system         PnP device PNP0c01
/0/100/14.3/1                            system         PnP device PNP0c02
/0/100/14.3/2                            system         PnP device PNP0b00
/0/100/14.3/3                            system         PnP device PNP0c02
/0/100/14.3/4                            printer        PnP device PNP0400
/0/100/14.3/5                            communication  PnP device PNP0501
/0/100/14.3/6                            system         PnP device PNP0c02
/0/101                                   bridge         Starship/Matisse PCIe Dummy Host Bridge
/0/102                                   bridge         Starship/Matisse PCIe Dummy Host Bridge
/0/103                                   bridge         Starship/Matisse PCIe Dummy Host Bridge
/0/104                                   bridge         Starship/Matisse PCIe Dummy Host Bridge
/0/105                                   bridge         Starship/Matisse PCIe Dummy Host Bridge
/0/106                                   bridge         Starship/Matisse PCIe Dummy Host Bridge
/0/107                                   bridge         Starship/Matisse PCIe Dummy Host Bridge
/0/108                                   bridge         Matisse/Vermeer Data Fabric: Device 18h; Function 0
/0/109                                   bridge         Matisse/Vermeer Data Fabric: Device 18h; Function 1
/0/10a                                   bridge         Matisse/Vermeer Data Fabric: Device 18h; Function 2
/0/10b                                   bridge         Matisse/Vermeer Data Fabric: Device 18h; Function 3
/0/10c                                   bridge         Matisse/Vermeer Data Fabric: Device 18h; Function 4
/0/10d                                   bridge         Matisse/Vermeer Data Fabric: Device 18h; Function 5
/0/10e                                   bridge         Matisse/Vermeer Data Fabric: Device 18h; Function 6
/0/10f                                   bridge         Matisse/Vermeer Data Fabric: Device 18h; Function 7
/1                      /dev/fb0         display        EFI VGA
/2                      input0           input          Power Button
/3                      input1           input          Power Button
/4                      wlx984827d15f9f  network        Wireless interface
JanPokorny commented 1 year ago

Another symptom that I just noticed and is probably related: when screen recording (I use simplescreenrecorder), the resulting video has artifacts of the underlying background. My WM puts a light blue rectangle under the active window, under which I have a desktop background of a forest -- you can see both "glitching through" the active window here. This also started happening today.

https://github.com/pop-os/pop/assets/4580066/89e70e65-af41-48a4-b3cb-00a81c040e79

Video still-frame of one of the glitches:

image

samuelznewton commented 1 year ago

I am also having this issue (which, for me as well, began after running some updates). I believe it may be related to an issue with Nvidia drivers since 530, described in greater detail here: https://forums.developer.nvidia.com/t/flickering-at-the-top-of-the-screen/256447

Some items of note from that forum post:

With all that said, I wonder whether Pop might be able to allow users to downgrade to Nvidia driver 525 until Nvidia releases a new driver without the flicker issue. Downgrading to 470 should work, as well, but that's quite a downgrade.

JanPokorny commented 1 year ago

@samuelznewton Thank you for linking the post! This indeed seems like the core issue.

You can downgrade to older drivers by this single command (at some point your screen might go black, so it's better to run the whole thing at once):

sudo apt remove -y 'nvidia-driver-*'; sudo apt install -y nvidia-driver-525-server; sudo reboot

I just did it and see no flickers so far, so this might be a viable workaround until it's fixed upstream.

...but my screen recording still has the same glitches! ~@samuelznewton could you please check if recording a rectangle with SimpleScreenRecorder has the same glitches as the video I posted above, before and after the driver downgrade? Perhaps it's a different issue, but it started happening to me at the exact same time.~ I tried "GPU Screen Recorder" from the PopShop and it works well, so maybe this is not worth investigating as I'm happy as long as something works šŸ˜…

louist103 commented 1 year ago

I also have this exact issue but it happens on both my Nvidia card and Intel iGPU. For reference I have : 12th Gen IntelĀ® Coreā„¢ i7-1280P Ɨ 20 Mesa IntelĀ® Graphics (ADL GT2) NVIDIA T550

JanPokorny commented 1 year ago

Just to report after a month -- the downgrade to driver version 525 resolved this and caused no other issues. So, for anyone facing the same issue, the workaround is:

sudo apt remove -y 'nvidia-driver-*'; sudo apt install -y nvidia-driver-525-server; sudo reboot
JanPokorny commented 12 months ago

Driver 525 breaks on kernel 6.5.4, thus the workaround now also requires holding the kernel version to 6.4.6. (https://github.com/pop-os/pop/issues/3147)

samuelznewton commented 12 months ago

I did check, and driver 470 works for me. It really is a significant downgrade (for games, anyway), but it's probably better than trying to keep Pop from updating the kernel. While a different solution would be nice, I don't expect System76 to provide support specifically for hardware that I doubt they're currently shipping.

JanPokorny commented 11 months ago

@samuelznewton Also switched to driver 470. Seems fine for non-gaming use. System76 did ship PCs with RTX 2000 cards (https://tech-docs.system76.com/models/thelio-major-b1-b2-r1-r2/README.html), btw (but I'm not on a System76 device).

samuelznewton commented 10 months ago

Many reports on that Nvidia thread I linked in my first comment suggest that this issue is likely fixed as of driver 545.23.06. This driver is not yet available in the Pop Shop (but might be soon, if this Reddit post is correct and I'm understanding it right? https://www.reddit.com/r/pop_os/comments/17sdivu/the_5452902_nvidia_driver_got_merged_but_i_cant/ ).

leviport commented 10 months ago

NVIDIA 545.29.02 will be released a little later today.

JanPokorny commented 10 months ago

No flashes for me since updating to NVIDIA 545.29.02, seems like this really is fixed!

zndrr commented 10 months ago

Ironically my issues popped up after 545.29.02, which was up from the working 545.23.xx. Have an LG C1 attached as only screen, and couldn't do 100Hz+ at all anymore (loss of signal) - and tearing/black screening during fullscreen video at 60Hz.

Rolled way back to 470 after some hassles since the repos were forcing 545 on any of the 5xx releases, and NVIDIA-sourced drivers wouldn't build. Though I can see that perhaps nvidia-driver-535-server etc might take.