pop-os / pop

A project for managing all Pop!_OS sources
https://system76.com/pop
2.47k stars 87 forks source link

Extreme flickering on RTX4060 Lenovo Legion 5 slim laptop #3162

Open IncrediblePony opened 1 year ago

IncrediblePony commented 1 year ago

Distribution (run cat /etc/os-release and hostnamectl):

NAME="Pop!_OS"
VERSION="22.04 LTS"
ID=pop
ID_LIKE="ubuntu debian"
PRETTY_NAME="Pop!_OS 22.04 LTS"
VERSION_ID="22.04"
HOME_URL="https://pop.system76.com"
SUPPORT_URL="https://support.system76.com"
BUG_REPORT_URL="https://github.com/pop-os/pop/issues"
PRIVACY_POLICY_URL="https://system76.com/privacy"
VERSION_CODENAME=jammy
UBUNTU_CODENAME=jammy
LOGO=distributor-logo-pop-os
Operating System: Pop!_OS 22.04 LTS               
          Kernel: Linux 6.6.0-060600rc5-generic
    Architecture: x86-64
 Hardware Vendor: Lenovo
  Hardware Model: Legion Slim 5 16APH8

video of problem

Related Application and/or Package Version (run apt policy $PACKAGE NAME): don't know

Issue/Bug Description: Extreme flickering on monitors when the PC has been on for a while. As shown in the linked video above. I have tried just about everything, from switching to discrete graphics in the bios, to installing the newest RC kernel, to installing different nvidia drivers. Nothing seems to mitigate the problem.

The monitors are connected through a thunderbolt port on the machine. The external monitors are daisychained with DP cables.

Reason for attempting an RC kernel link

Steps to reproduce (if you know): N/A

Expected behavior: Not having a possible epileptic seizure from flickering monitors.

Other Notes: When upgrading aptpackages sometimes the terminal tells me that some amdgpu firmware packages are missing. Will add them next time I see them. I have linux-firmware and amdgpu-install installed on my machine. None of these fixes the error.

the-rich-piana commented 10 months ago

Also getting this issue on PopOS when I plug in an external monitor. I have the Legion Slim 7i 16APH8 also with a 4060. Happens with BOTH Thunderbolt and HDMI connections. Also random but do your speakers work?

Edit: I should mention that I also get those errors saying amdgpu firmware is missing. But I read somewhere that it is not an issue.

IncrediblePony commented 10 months ago

Also random but do your speakers work?

I get some SERIOUS weird behaviour on the speakers as well. They sometimes stutter wildly or plainly just skips sounds. Sometimes they just cut out entirely.

systemctl --user restart wireplumber pipewire pipewire-pulse

This command seems to fix it

the-rich-piana commented 10 months ago

Mine don't work at all, but that's very strange. Can I ask what kernel version you are on, and your Nvidia driver? Updating the kernel and driver caused me way more issues than it fixed. I'm surprised you actually get audio out of your speakers though.

I'm on PopOS: 6.5.6-76060506-generic Nvidia: 545.29.02

Also, downgrading my nvidia driver and kernel somewhat helped with my flickering issues/weird screen bugs. So far no flickering. I also found that disabling any thunderbolt security helped, and I can't remember how, but I was able to "remember" my monitor in this settings page: image

IncrediblePony commented 10 months ago

I went and uninstalled Pop!_OS and installed Ubuntu 23.10 instead. Works like a charm.

My colleague had these issues but for some magic reason his kernel doesn't seem to bug out as much. The audio issue still persists

the-rich-piana commented 10 months ago

Ok I think I'm going to ditch Pop too, this is kind of ridiculous. There's way too many problems with this OS. We tried.

leviport commented 10 months ago

There's way too many problems with this OS.

the-rich-piana commented 10 months ago

Ya that's true. I mean it is a relatively new laptop (I think 2023 7i and 5i came out last January?).

the-rich-piana commented 10 months ago

Out of curiosity, did you do a clean install of Ubuntu? It's going to be a pain in the butt moving all my bash scripts and files over to Ubuntu again. I currently use systemd dual boot with Windows 11.

IncrediblePony commented 9 months ago

Out of curiosity, did you do a clean install of Ubuntu? It's going to be a pain in the butt moving all my bash scripts and files over to Ubuntu again. I currently use systemd dual boot with Windows 11.

I copied the bash files from my home folder and install a clean Ubuntu. Then I at the very least had all of my aliases and PATH variables.

the-rich-piana commented 9 months ago

https://forums.lenovo.com/t5/Ubuntu/Ubuntu-and-legion-pro-7-16IRX8H-audio-issues/m-p/5210709?page=25#6227585

This thread has a solution for the Legion 7 Pro (Intel). So I am going to follow up with that and see if their is an equivalent solution for AMD. Not worth my time reinstalling Ubuntu lol.

the-rich-piana commented 9 months ago

Also for screen flickering, I found that going to display settings -> disable fractional scaling -> enable fractional scaling -> set to 125% or whatever you want -> apply. Will resolve the flickering, and then you just repeat these steps when it happens every few hours.

IncrediblePony commented 8 months ago

Also for screen flickering, I found that going to display settings -> disable fractional scaling -> enable fractional scaling -> set to 125% or whatever you want -> apply. Will resolve the flickering, and then you just repeat these steps when it happens every few hours.

The fact that you have to do this workaround is just plain sad

leviport commented 8 months ago

Wait, this involved fractional scaling all along? Might be good to try Wayland then. Gnome 42 on X with fractional scaling is notoriously flaky. I believe Ubuntu 23.10 uses Wayland by default.

dxps commented 6 months ago

Just changed the laptop to a Lenovo Yoga Pro 7 14APH8 - Type 82Y8, installed Pop!_OS 22.04 LTS (up to date at the time of this writing), and after 2-3 hours both screens (laptop's display and external monitor) started flickering as shown in the original message video.

Even after disconnecting the external monitor the issue continue to happen (on the built-in display).

In syslog (/var/log/syslog file), the entries that were written while this happened are like this:

May 13 11:21:32 dxps kernel: [12544.467079] amd_iommu_report_page_fault: 101357 callbacks suppressed
May 13 11:21:32 dxps kernel: [12544.467089] amdgpu 0000:63:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfffff707000 flags=0x0000]
May 13 11:21:32 dxps kernel: [12544.467116] amdgpu 0000:63:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfffff708000 flags=0x0000]
May 13 11:21:32 dxps kernel: [12544.467127] amdgpu 0000:63:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfffff714000 flags=0x0000]
May 13 11:21:32 dxps kernel: [12544.467136] amdgpu 0000:63:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfffff726000 flags=0x0000]
May 13 11:21:32 dxps kernel: [12544.467144] amdgpu 0000:63:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfffff738000 flags=0x0000]
May 13 11:21:32 dxps kernel: [12544.467153] amdgpu 0000:63:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfffff740000 flags=0x0000]
May 13 11:21:32 dxps kernel: [12544.467162] amdgpu 0000:63:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfffff741000 flags=0x0000]
May 13 11:21:32 dxps kernel: [12544.467170] amdgpu 0000:63:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfffff742000 flags=0x0000]
May 13 11:21:32 dxps kernel: [12544.467179] amdgpu 0000:63:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfffff743000 flags=0x0000]
May 13 11:21:32 dxps kernel: [12544.467188] amdgpu 0000:63:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfffff744000 flags=0x0000]

I had to relogin, for now. I'll keep looking for potential solutions. I'm a long Pop!_OS user and I'm very happy with it, so I wanna stay with it.

the-rich-piana commented 6 months ago

I think I have found a more permanent solution. Can also confirm I get this error in dmesg (completely unhelpful) amdgpu 0000:63:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfffff707000 flags=0x0000]

After some digging I found a post which mentioned this Framework laptop fix: https://knowledgebase.frame.work/en_us/allocate-additional-ram-to-igpu-framework-laptop-13-amd-ryzen-7040-series-BkpPUPQa

For Legion Laptops: Go into BIOS, select Configuration, go to AMD UMA Frame Buffer Size setting, I think default is 512, I maxed it out to 4gi or whatever. So far, I can unplug my monitor, plug it in, close the lid, watch videos fullscreen, etc... with no flickering or weird crashes.

Let me know if this helps anyone else.

the-rich-piana commented 6 months ago

@dxps the above fix is working for me on Pop!

dxps commented 6 months ago

@the-rich-piana Thanks, Giuliano! :pray:

Unfortunately, went to this Lenovo BIOS and found that in Configuration > UMA Frame Buffer Size is already set to its maximum value of 4G. So, I couldn't do anything about it.

I also install the driver from https://www.amd.com/en/support/linux-drivers., Ubuntu x86 64-Bit section. Reinstalled Radeon™ Software for Linux® version 23.40.2 for Ubuntu 22.04.4 HWE, just to make sure I didn't choose the first one (for 20.04.6) by mistake.

As per installation instructions, to use amdgpu-install, I had to update /etc/os-release to have ID=ubuntu (instead of ID=pop), as the tool gave up saying that pop is not supported.

Unfortunately, that fails with:

Building initial module for 6.8.0-76060800daily20240311-generic
ERROR (dkms apport): kernel package linux-headers-6.8.0-76060800daily20240311-generic is not supported
Error! Bad return status for module build on kernel: 6.8.0-76060800daily20240311-generic (x86_64)
Consult /var/lib/dkms/amdgpu/6.3.6-1718217.22.04/build/make.log for more information.
dpkg: error processing package amdgpu-dkms (--configure):
 installed amdgpu-dkms package post-installation script subprocess returned error exit status 10

See the details below:

`sudo amdgpu-install --usecase=graphics,opencl` output ```shell …/drivers ❯ sudo amdgpu-install --usecase=graphics,opencl Hit:1 https://repo.teamsforlinux.de/debian stable InRelease Get:2 https://packages.microsoft.com/repos/code stable InRelease [3,590 B] Hit:3 https://ppa.launchpadcontent.net/linuxuprising/guake/ubuntu jammy InRelease Hit:4 https://repo.radeon.com/amdgpu/6.0.2/ubuntu jammy InRelease Hit:5 https://repo.radeon.com/rocm/apt/6.0.2 jammy InRelease Hit:6 http://apt.pop-os.org/proprietary jammy InRelease Hit:7 http://apt.pop-os.org/release jammy InRelease Hit:8 http://apt.pop-os.org/ubuntu jammy InRelease Hit:9 http://apt.pop-os.org/ubuntu jammy-security InRelease Hit:10 http://apt.pop-os.org/ubuntu jammy-updates InRelease Hit:11 http://apt.pop-os.org/ubuntu jammy-backports InRelease Fetched 3,590 B in 2s (1,952 B/s) Reading package lists... Done Reading package lists... Done Building dependency tree... Done Reading state information... Done amdgpu-dkms is already the newest version (1:6.3.6.60002-1718217.22.04). amdgpu-lib is already the newest version (1:6.0.60002-1718217.22.04). amdgpu-lib32 is already the newest version (1:6.0.60002-1718217.22.04). linux-headers-6.8.0-76060800daily20240311-generic is already the newest version (6.8.0-76060800daily20240311.202403110203~1714077665~22.04~4c8e9a0). The following additional packages will be installed: comgr hsa-rocr libelf-dev libncurses-dev libtinfo-dev openmp-extras-runtime rocm-core rocm-language-runtime rocm-ocl-icd rocm-opencl Suggested packages: ncurses-doc The following NEW packages will be installed: comgr hsa-rocr libelf-dev libncurses-dev libtinfo-dev openmp-extras-runtime rocm-core rocm-language-runtime rocm-ocl-icd rocm-opencl rocm-opencl-runtime 0 upgraded, 11 newly installed, 0 to remove and 0 not upgraded. 1 not fully installed or removed. Need to get 194 MB of archives. After this operation, 443 MB of additional disk space will be used. Do you want to continue? [Y/n] Get:1 http://apt.pop-os.org/ubuntu jammy-security/main amd64 libncurses-dev amd64 6.3-2ubuntu0.1 [381 kB] Get:2 http://apt.pop-os.org/ubuntu jammy-security/main amd64 libtinfo-dev amd64 6.3-2ubuntu0.1 [780 B] Get:3 https://repo.radeon.com/rocm/apt/6.0.2 jammy/main amd64 rocm-core amd64 6.0.2.60002-115~22.04 [9,034 B] Get:4 http://apt.pop-os.org/ubuntu jammy/main amd64 libelf-dev amd64 0.186-1build1 [64.4 kB] Get:5 https://repo.radeon.com/rocm/apt/6.0.2 jammy/main amd64 comgr amd64 2.6.0.60002-115~22.04 [51.7 MB] Get:6 https://repo.radeon.com/rocm/apt/6.0.2 jammy/main amd64 hsa-rocr amd64 1.12.0.60002-115~22.04 [823 kB] Get:7 https://repo.radeon.com/rocm/apt/6.0.2 jammy/main amd64 openmp-extras-runtime amd64 17.60.0.60002-115~22.04 [140 MB] Get:8 https://repo.radeon.com/rocm/apt/6.0.2 jammy/main amd64 rocm-language-runtime amd64 6.0.2.60002-115~22.04 [834 B] Get:9 https://repo.radeon.com/rocm/apt/6.0.2 jammy/main amd64 rocm-ocl-icd amd64 2.0.0.60002-115~22.04 [16.3 kB] Get:10 https://repo.radeon.com/rocm/apt/6.0.2 jammy/main amd64 rocm-opencl amd64 2.0.0.60002-115~22.04 [595 kB] Get:11 https://repo.radeon.com/rocm/apt/6.0.2 jammy/main amd64 rocm-opencl-runtime amd64 6.0.2.60002-115~22.04 [2,016 B] Fetched 194 MB in 20s (9,686 kB/s) Selecting previously unselected package libncurses-dev:amd64. (Reading database ... 222648 files and directories currently installed.) Preparing to unpack .../00-libncurses-dev_6.3-2ubuntu0.1_amd64.deb ... Unpacking libncurses-dev:amd64 (6.3-2ubuntu0.1) ... Selecting previously unselected package libtinfo-dev:amd64. Preparing to unpack .../01-libtinfo-dev_6.3-2ubuntu0.1_amd64.deb ... Unpacking libtinfo-dev:amd64 (6.3-2ubuntu0.1) ... Selecting previously unselected package rocm-core. Preparing to unpack .../02-rocm-core_6.0.2.60002-115~22.04_amd64.deb ... Unpacking rocm-core (6.0.2.60002-115~22.04) ... Selecting previously unselected package comgr. Preparing to unpack .../03-comgr_2.6.0.60002-115~22.04_amd64.deb ... Unpacking comgr (2.6.0.60002-115~22.04) ... Selecting previously unselected package hsa-rocr. Preparing to unpack .../04-hsa-rocr_1.12.0.60002-115~22.04_amd64.deb ... Unpacking hsa-rocr (1.12.0.60002-115~22.04) ... Selecting previously unselected package libelf-dev:amd64. Preparing to unpack .../05-libelf-dev_0.186-1build1_amd64.deb ... Unpacking libelf-dev:amd64 (0.186-1build1) ... Selecting previously unselected package openmp-extras-runtime. Preparing to unpack .../06-openmp-extras-runtime_17.60.0.60002-115~22.04_amd64.deb ... Unpacking openmp-extras-runtime (17.60.0.60002-115~22.04) ... Selecting previously unselected package rocm-language-runtime. Preparing to unpack .../07-rocm-language-runtime_6.0.2.60002-115~22.04_amd64.deb ... Unpacking rocm-language-runtime (6.0.2.60002-115~22.04) ... Selecting previously unselected package rocm-ocl-icd. Preparing to unpack .../08-rocm-ocl-icd_2.0.0.60002-115~22.04_amd64.deb ... Unpacking rocm-ocl-icd (2.0.0.60002-115~22.04) ... Selecting previously unselected package rocm-opencl. Preparing to unpack .../09-rocm-opencl_2.0.0.60002-115~22.04_amd64.deb ... Unpacking rocm-opencl (2.0.0.60002-115~22.04) ... Selecting previously unselected package rocm-opencl-runtime. Preparing to unpack .../10-rocm-opencl-runtime_6.0.2.60002-115~22.04_amd64.deb ... Unpacking rocm-opencl-runtime (6.0.2.60002-115~22.04) ... Setting up libncurses-dev:amd64 (6.3-2ubuntu0.1) ... Setting up amdgpu-dkms (1:6.3.6.60002-1718217.22.04) ... Removing old amdgpu-6.3.6-1718217.22.04 DKMS files... Deleting module amdgpu-6.3.6-1718217.22.04 completely from the DKMS tree. Loading new amdgpu-6.3.6-1718217.22.04 DKMS files... Building for 6.8.0-76060800daily20240311-generic Building for architecture x86_64 Building initial module for 6.8.0-76060800daily20240311-generic ERROR (dkms apport): kernel package linux-headers-6.8.0-76060800daily20240311-generic is not supported Error! Bad return status for module build on kernel: 6.8.0-76060800daily20240311-generic (x86_64) Consult /var/lib/dkms/amdgpu/6.3.6-1718217.22.04/build/make.log for more information. dpkg: error processing package amdgpu-dkms (--configure): installed amdgpu-dkms package post-installation script subprocess returned error exit status 10 Setting up rocm-core (6.0.2.60002-115~22.04) ... update-alternatives: using /opt/rocm-6.0.2 to provide /opt/rocm (rocm) in auto mode Setting up rocm-ocl-icd (2.0.0.60002-115~22.04) ... Setting up libelf-dev:amd64 (0.186-1build1) ... Setting up libtinfo-dev:amd64 (6.3-2ubuntu0.1) ... Setting up comgr (2.6.0.60002-115~22.04) ... Setting up hsa-rocr (1.12.0.60002-115~22.04) ... Setting up rocm-opencl (2.0.0.60002-115~22.04) ... Setting up openmp-extras-runtime (17.60.0.60002-115~22.04) ... Setting up rocm-language-runtime (6.0.2.60002-115~22.04) ... Setting up rocm-opencl-runtime (6.0.2.60002-115~22.04) ... update-alternatives: using /opt/rocm-6.0.2/bin/clinfo to provide /usr/bin/clinfo (clinfo) in auto mode Processing triggers for man-db (2.10.2-1) ... Processing triggers for libc-bin (2.35-0ubuntu3.7) ... Errors were encountered while processing: amdgpu-dkms E: Sub-process /usr/bin/dpkg returned an error code (1) …/drivers took 1m27s❯ ```
dxps commented 5 months ago

If it helps, here are some further details:

  1. The issue happens when having (receiving or doing) a screen sharing in Ms Teams, using teams-for-linux desktop), not when using it in the browser (in Firefox).
  2. Entries captured while this issue happened. image
the-rich-piana commented 5 months ago

@dxps Does the issue only happen in MS Teams? For what it's worth, I have been screen sharing with zoom since I bought this thing and never had any flickering caused by it.

dxps commented 5 months ago

@the-rich-piana Yes, it happens only in Ms Teams and on the desktop version/version mentioned above, when myself or another attendee of a call does the screen sharing. It works for some time - I guess a couple of seconds, up to a minute - and then the flickering starts. For the last two days I was using both the browser based and the desktop client of Ms Teams, but attended the calls that may include screen sharing only in the browser based version. And thus, had no issues with it.

Currently, I'm not using Zoom at work, so I cannot tell if that issue happens with it.

the-rich-piana commented 5 months ago

@dxps Is the flickering happening when you plug in a monitor or only with Teams? Could be a purely MS Teams related problem.

dxps commented 5 months ago

@the-rich-piana It started happenning when having the external monitor plugged in. But while the flickering started, it continued even after disconnecting the external monitor.

Meanwhile, I did:

  1. Run kernelstub -a amdgpu.sg_display=0 and rebooted the OS, just for sure.
  2. Lowered the VRAM usage by (unfortunately) disabling one (lovely) Gnome extension (details here).

Yes, it started while using Ms Teams, both Web version and desktop one.

I'll see if today it will appear again (while continuing to run both versions of Ms Teams and watching the VRAM usage using amdgpu_top).

IncrediblePony commented 5 months ago

@dxps Is the flickering happening when you plug in a monitor or only with Teams? Could be a purely MS Teams related problem.

It is not. I have experienced it with Android Studio, Chrome, Firefox, Slack and many more.

I ran Ubuntu 23.10 for about 6 months with not too many issues.

Last week I installed a clean Debian 12 and went to town. Some issues still occur with flickering if I'm too "aggressive" when starting the PC. If the external (daisy chained) monitors are plugged on boot, right after login there is about a 30% chance I get the flickering issue out of the box. If I have the monitors disconnected and wait about 10 seconds after login, and then plug it in I have about 2% chance of experiencing the flickering.

As far as my travels have taken me across the internet the issue seems to stem from two factors:

  1. My Lenovo Legion Slim 5 16APH8 is a piece of sh*t and the hardware components have been picked by lenovo the same way I make fastfood choices when hammered.
  2. There is a discrepency between the AMD Ryzen™ 5 7640HS w/ Radeon™ 760M Graphics × 12 and NVIDIA GeForce RTX™ 4060 Laptop GPU / AMD Radeon™ Graphics setup that neither AMD or NVIDIA seems to want to do anything about.

I have tried giving my machines BIOS options all the juice it can take and reduced the juice by 90% neither of those options seems to mitigate the issue.

At this point to VERY much looks like a software/driver issue. At this point in time I'm not even sure that my Debian machine is using the RTX 4060 GPU at all even though I have installed every driver known to man and tried to force discrete graphics or just use dynamic graphics. I am at a loss for now.

Hopefully this issue stays open and enough people find it that someone somewhere can do something.

IncrediblePony commented 5 months ago

https://youtube.com/shorts/ceflbk80oPc?si=-TuIhVwFaqhvU5DN

This is the latest behaviour on Debian 12 for me

the-rich-piana commented 5 months ago

@dxps what the hell. I'm not sure we have the same issue to be honest. That looks like some serious GPU artifacting. Imo that could be hardware related. I never experienced that, for me it was just a white blank screen.

dxps commented 5 months ago

I solved the problem by disabling a Gnome extension that was quite vram hungry (details here). Also, maybe relevant, I'm on on older 6.5.6 kernel, instead of 6.8.0, due to another issue (details here).

IncrediblePony commented 5 months ago

I solved the problem by disabling a Gnome extension that was quite vram hungry (details here). Also, maybe relevant, I'm on on older 6.5.6 kernel, instead of 6.8.0, due to another issue (details here).

Will attempt to monitor VRAM usage a bit more close to perhaps pinpoint issues.