intel / thermal_daemon

Thermal daemon for IA
GNU General Public License v2.0
539 stars 117 forks source link

Lenovo Legion / Ubuntu 22.04 system wide stuttering / throttling when under load #410

Closed stubkan closed 1 year ago

stubkan commented 1 year ago

I believe there may be a bug with Ubuntu 22.04's version of thermald (2.4.9) that forces the cpu to the minimum frequency overzealously. I'd like a way to work around / fix that without reinstalling and setting up another distro again.

Ever since I upgraded from Ubuntu 20 to 22.04 my system that used to run everything smoothly suddenly had stuttering and lag in all games and applications even on very low settings.

Here is my old post attempting to figure out the cause ;

https://github.com/ValveSoftware/steam-for-linux/issues/9717

Here the post that caught the issue describing multiple people with similar system as mine saying disabling thermald entirely fixed their issue ;

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1973434

I decided to try that theory and disabled thermald and ran up space engineers which stuttered regularly every few minutes and drove around with no slowdown at last, I've attached the logs for this, which show cpu go up to almost 100 degrees, so I do need thermald working, I think, to prevent a meltdown.

log from s-tui; running with thermald OFF ; s-tui_log_2023-07-29_16_17_21.csv

summary ;

I re-enabled thermald and ran space engineers with the cpu monitor c-tui recording to capture when stuttering kicks in. s-tui_log_2023-07-29_16_48_51.csv

summary ;

So, it appears an overzealous thermald is killing game and appliance performance and forcing cpu temps down to 55 and attempting to keep it there.

Is there a workaround? I dont think it's safe for me to disable this service, but I can't have it do this. I cant find much info on how to fine tune thermald or what parameters are safe... I attempted to use cpupower-gui to adjust the cpu governors to high performance but thermald appears to override that.


Interestingly, trying to get thermald debug logs with 'sudo thermald --no-daemon --loglevel=debug' appears to bypass things, because it does not cause any stuttering even with high load.

thermald logs ; Drove around in space engineers for a while with no stuttering. thermal_log.txt

Starting up no mans sky - it decided to process vulkan shaders. That drove cpu to max usage, which the logs of s-tui show here;
s_tui_nms_log.csv And thermald running in debug mode doesn't think thats worth throttling; thermal_log_nms.txt

Average temp of 95 degrees, frequency of 4100-4200 mhz and it doesn't throttle it if running in debug mode, kind of defeating the purpose of trying to troubleshoot it if it behaves completely differently.


My system

neofetch ``` stubkan@Legion OS: Linux Mint 21 x86_64 Host: 81Y6 Lenovo Legion 5 15IMH05H Kernel: 5.17.0-1035-oem Shell: bash 5.1.16 DE: Cinnamon WM: Mutter WM Theme: Mint-Y CPU: Intel i5-10300H (8) @ 4.500GHz GPU: Intel CometLake-H GT2 [UHD Graphics] GPU: NVIDIA GeForce RTX 2060 Mobile Memory: 4074MiB / 64181MiB ```
systemctl status thermald.service ``` stubkan@Legion:~$ sudo systemctl status thermald.service ● thermald.service - Thermal Daemon Service Loaded: loaded (/lib/systemd/system/thermald.service; enabled; vendor preset: enabled) Active: active (running) since Sat 2023-07-29 19:04:23 BST; 2min 3s ago Main PID: 22924 (thermald) Tasks: 4 (limit: 76769) Memory: 1.3M CPU: 28ms CGroup: /system.slice/thermald.service └─22924 /usr/sbin/thermald --systemd --dbus-enable --adaptive Jul 29 19:04:23 Legion systemd[1]: Started Thermal Daemon Service. Jul 29 19:04:23 Legion thermald[22924]: 22 CPUID levels; family:model:stepping 0x6:a5:2 (6:165:2) Jul 29 19:04:23 Legion thermald[22924]: 22 CPUID levels; family:model:stepping 0x6:a5:2 (6:165:2) Jul 29 19:04:23 Legion thermald[22924]: sensor id 15 : No temp sysfs for reading raw temp Jul 29 19:04:23 Legion thermald[22924]: sensor id 15 : No temp sysfs for reading raw temp Jul 29 19:04:23 Legion thermald[22924]: sensor id 15 : No temp sysfs for reading raw temp Jul 29 19:04:24 Legion thermald[22924]: Polling mode is enabled: 4 Jul 29 19:04:28 Legion thermald[22924]: Manufacturer didn't provide adequate support to run in Jul 29 19:04:28 Legion thermald[22924]: optimized configuration on Linux with open source Jul 29 19:04:28 Legion thermald[22924]: You may want to disable thermald on this system if you see issue ```

Didn't provide support? I may want to disable this? Everywhere I google tells me not to disable it?

Attempting to use the linked dptfxtract tool generates a config file that looks wrong to me,

thermal-conf.xml.auto ``` auto_zone_0 B0D4 0 Passive B0D4 1 2147483647 B0D4 1000 Passive B0D4 1 ```

Temperatures of 0 and 1000? It should be more like 55000 to 95000

stubkan commented 1 year ago

I saw https://github.com/intel/thermal_daemon/issues/364 and I noticed my version was 2.4.9

Cloning this repo, compiling and installing thermald to make it an updated version to push it past 2.5.0 and now my system is so much faster, nothing slows down or gets throttled anymore.

The ubuntu distros need to update their thermald versions for sure

spandruvada commented 1 year ago

Please attach logs with the latest thermald. I can see why the system is getting throttled.

stubkan commented 1 year ago

I have manually compiled and installed the latest thermald and it made the problem go away, and I've marked the package as hold so apt-get does not overwrite with 2.4.9 anymore.

Ubuntu needs to move their distro version of thermald away from 2.4.9 for sure.

spandruvada commented 1 year ago

Since this is not an issue with github version, please close if nothing to be done here.