pop-os / pop

A project for managing all Pop!_OS sources
https://system76.com/pop
2.48k stars 87 forks source link

Dell Precision 7530 and Pop OS 20.04 - CPU stuck at 800MHz after latest update #1689

Open tuhlmann opened 3 years ago

tuhlmann commented 3 years ago

Distribution (run cat /etc/os-release): VERSION="20.04 LTS" ID=pop ID_LIKE="ubuntu debian" PRETTY_NAME="Pop!_OS 20.04 LTS" VERSION_ID="20.04" HOME_URL="https://pop.system76.com" SUPPORT_URL="https://support.system76.com" BUG_REPORT_URL="https://github.com/pop-os/pop/issues" PRIVACY_POLICY_URL="https://system76.com/privacy" VERSION_CODENAME=focal UBUNTU_CODENAME=focal LOGO=distributor-logo-pop-os

Related Application and/or Package Version (run apt policy $PACKAGE NAME):

Issue/Bug Description: I have a Dell Precision 7530 with an i9-8950HK CPU. I'm using Pop OS 20.04 NVidia version but have the Intel GPU active. After the latest updates that I installed today I noticed that the CPU frequency is stuck at 800MHz.

Looking at cpufreq-info it tells me that the range is '800 MHz - 4.80 GHz' and currently 800MHz is choosen. The driver reported by it is 'intel_pstate'. Everything is dead slow right now.

I had this same problem when I installed Pop OS 20.10 or Ubuntu 20.10 on this very machine maybe 2 month back. I noticed that the problem would be gone with the then beta of Ubuntu 21.04 so I decided to stick to the LTS version. I documented the issue and received some answers in this thread: https://askubuntu.com/questions/1307773/in-ubuntu-20-10-cpu-clock-fixed-at-800mhz

I did add intel_pstate=active, as that one AU answer states, to the systemd kernel params and ran update-initramfs -u -k all and rebooted, but that didn't help.

Steps to reproduce (if you know):

Expected behavior:

Other Notes:

testerfr810 commented 3 years ago

you have the right driver installed ? check command lspci lspci -v lspci -nk lshw grep -E '^(cpu family|model|stepping|microcode)' /proc/cpuinfo | sort -u

for you normaly result for a i9-8950hk cpu family : 6 microcode : 0x9e model : 158 model name : IIntel(R) Core(TM) i9-8950HK CPU @ 4.80GHz stepping : 10

If not that you need update

see here too https://support.system76.com/articles/system-firmware/

tuhlmann commented 3 years ago

@testerfr810 Thanks for your response! The commands you posted all don't give any errors. Do you want me to post the result here (I didn't want to spam the ticket)? The cpuinfo is nearly identical:

tuhlmann@cassandra:~$ grep -E '^(cpu family|model|stepping|microcode)' /proc/cpuinfo | sort -u
cpu family  : 6
microcode   : 0xde
model       : 158
model name  : Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
stepping    : 10

The Firmware tab in Gnome settings does not give me an update, which is expected since it's not a System76 machine. But there is a new BIOS update from Dell, which I'll install now and then report back.

Update: Updating the BIOS did not change anything.

Please note that the system has been running mighty fine for the past few month. It was just these recent updates that broke it.

testerfr810 commented 3 years ago

try.

sudo apt --reinstall install intel-microcode

sudo apt --reinstall install intel-media-va-driver

and see if change.

grep -E '^(cpu family|model|stepping|microcode)' /proc/cpuinfo | sort -u

or this same just for check microcode (Normaly your CPU is 0x9e)

journalctl -b -k | grep "microcode updated early to"

i do that

https://github.com/testerfr810/CPU_mod/blob/main/CPU%20manage

testerfr810 commented 3 years ago

Or try downgrade and wait other release

https://support.system76.com/articles/upgrade-pop/

tuhlmann commented 3 years ago

Nothing of the above commands changed a thing:

After reinstalling microcode and the driver, /proc/cpuinfo reported the same for my cpu. journalctl -b -k | grep "microcode updated early to" reports nothing.

Also changing the governor of setting a fixed frequency per CPU as per your script doesn't work. The governor is always "powersave" (but I saw this behavior even when it was working), and the frequency always sticks to 800MHz.

Downgrading isn't currently an option for me as I didn't upgrade. I was only installing updates within Pop OS 20.04.

In the askubuntu post above (https://askubuntu.com/questions/1307773/in-ubuntu-20-10-cpu-clock-fixed-at-800mhz) I outline that I saw this problem with every Ubuntu based 20.10 installation I tried and I saw it gone with Ubuntu 21.04.

If I don't find a working fix for my current installation I'll have to either go to Ubuntu 20.04 (which doesn't get the newer kernels as Pop OS), or go to Ubuntu 21.04 if it is still true that this version works (I tested only the beta version a few month ago).

fkap69 commented 3 years ago

I have the exact same problem on the exact same machine with the same i9-8950HK. Today I did my usual Pop!_Os update from Pop!_Shop on my 20.04 and rebooted. CPU is stack at 0.8GHz as indicated by "CPU Power Manager" Gnome extension and general machine slowdown. Installed also an "Intel-microcode processor update" that got offered after rebooted. Nothing changed. Everything indicated by @tuhlmann stands for me as well.

This is my work machine and that makes a huge problem for me. This is a very serious regression. What is the meaning of an LTS release if it can break like that suddenly after an update? Please advice if I can fix it without installing a different distribution from scratch.

testerfr810 commented 3 years ago

there is upgrade intel-microcode today try

sudo apt upgrade intel-microcode

fkap69 commented 3 years ago

Already did that, so it says: intel-microcode is already the newest version (3.20210216.0ubuntu0.20.04.1).

fkap69 commented 3 years ago

Also tried previous 5.8 kernel but didn't make any change.

fkap69 commented 3 years ago

Another attempt: passed intel_pstate=active as boot option but also did nothing.

testerfr810 commented 3 years ago

sudo apt-get install cpufrequtils

sudo nano /etc/default/cpufrequtils GOVERNOR="performance"

sudo systemctl restart cpufrequtils

run "cpufreq-info" to check governer

sudo watch -n 1 "cat /proc/cpuinfo | grep MHz | sort -n | uniq -cw 13"

tuhlmann commented 3 years ago

@testerfr810 I tried these changes.

cpufreq-info reports governor "performance" with a frequency of 800MHz for all threads.

sudo watch -n 1 "cat /proc/cpuinfo | grep MHz | sort -n | uniq -cw 13" reports:

Alle 1,0s: cat /proc/cpuinfo | grep MHz | sort -n | uniq -cw 13  cassandra: Tue May 25 09:05:51 2021

     10 cpu MHz         : 2900.000
      2 cpu MHz         : 800.032

This confuses me, because the performance I see is not that of 2.9GHz. Also, I tried to compile my project, and in that case instead of 10 threads showing 2.9GHz, this goes down to 5 or so, always varying, the other threads showing 800MHz. It seems that the machine uses 800MHz whenever there's load on the thread, where in the past it instead would go into a boost state.

I also installed all pending updates as of today and rebooted, but no changes.

tuhlmann commented 3 years ago

@fkap69 @testerfr810 I received a SO mail regarding this question I posted earlier on Ubuntu 20.10: https://askubuntu.com/a/1340838

It mentions a Reddit thread that discusses why thermald was removed from the repos into an AUR repo. It also has some folks mentioning they had these CPU problems because thermald would start to throttle at 50° and many noted that it is a relic from the past and does no longer need to be handled by a userspace service.

So I did disabled it and what should I say- my machine works again!

The fans kick in as programmed by the BIOS, which seems to work just as before. I have no long running experience, so I can't speak to that, and I won't take responsibility if your CPU gets fried :)

That said, the temperatures I see here on my Dell Precision 7530 and its i9 CPU are just as they were before.

I would really like someone with deeper system knowledge than myself chime in and elaborate why thermald is still used and what are the consequences of not using it.

To disable thermald, do:

sudo systemctl stop thermald
sudo systemctl disable thermald
fkap69 commented 3 years ago

@testerfr810 thanks for your continuous try to support us. @tuhlmann thanks for sharing and glad you were able to resolve it! I hope I knew it earlier :( (see below).

Sharing my own adventure during last week. Couldn't work with existing situation so I scheduled OS re-installation after taking backup e.t.c. That was not straight forward. Tested both Ubuntu 20.04 and 21.04 in full installations and the problem was not resolved. The difficult part was that it was working initially and then after a reboot the problem appeared again. I mean I was checking already the behavior from live usb before installing but that was not enough. For 21.04, after installation, I put all available updates and still was working OK. On next reboot the same issue.

What at last worked and I am using for some days now without any issue is Debian testing.

tuhlmann commented 3 years ago

Sorry for the struggles you went though! It wasn't as urgent for me as I now mainly work on a desktop machine and I also have a dual boot installation with Windows 10 and WSL2 that does work fine.

I assume Debian works because they are more conservative with any changes they apply to their distribution, though I could be wrong, my Debian experiences are from ages past.

Anyhow, should you decide to go back to Ubuntu / Pop OS, then this is the path that's working for me.

fkap69 commented 3 years ago

That was my work computer so I needed to fix it. In the end though I am very happy so far with Debian. It flies and I am so used to apt anyway, so no real problem.

Also, Debian testing which I put on the machine is nore Debian-stable (too conservative and old) neither Debian-unstable (bleeding edge). It's good main-ground it seems, as suggested also by our IT person. Imagine that I currently have kernel 5.10 and Gnome 3.38. And before putting nvidia driver it was using Wayland automatically ! Really happy with the transition so far, just not a planned one...

testerfr810 commented 3 years ago

you can try download last kernel kernel.org and custom it. i rhink miss. module.