Closed VitaliiSerdiuk closed 2 years ago
What version of thermald are you using ? thermald --version
Have you compiled version 2.4.8 and installed ?
I have a Latitude 7490, since the last release, I'm no longer stuck at 400 MHz
It only affects the 7x20 series. Also "stuck at 400MHZ" is not correct, with thermald it is stuck at 1800MHz after a short while. The issue is misleading, you have to dig through #293 for all information. But the original issue has been closed without reason, so this issue here is only to remind that it has not been solved I think.
@binboum yes, I compiled 2.4.8 version and I always stuck with 1500MHz under load on 5420
One funny thing, not sure if it can mean something. I have tried reloading these kernel modules as I have seen in throttled ticket:
rmmod intel_rapl_msr rmmod processor_thermal_device_pci_legacy rmmod processor_thermal_device rmmod processor_thermal_rapl rmmod intel_rapl_common rmmod intel_powerclamp
modprobe intel_powerclamp modprobe intel_rapl_common modprobe processor_thermal_rapl modprobe processor_thermal_device modprobe processor_thermal_device_pci_legacy modprobe intel_rapl_msr
Behaviour after that is almost same (still fixed to 1800MHz under heavy load with temperatures on 50°C) but two things changed: start of the load is better - I can see for 1-2 seconds frequency over 4GHz and CPU temperature around 90°C (before that I've never reached more than 60). And second strange thing I can hear some clicking noise coming from laptop :-) It happend sometimes - few clicks in a row and after that several tens of seconds or minutes silent...
Dell is collection information: https://www.dell.com/community/Latitude/Latitude-5420-7420-7520-CPU-Throttling-Issue-on-Linux/m-p/8129749/highlight/true#M39458
It seems also to affect the Lat. 5421 with the i5-11500H processor. It dips to 800 MHz under full load for some time, and bounces back to the correct 2900 MHz in a continuous cycle.
In my system, a workaround is to change the power profile in the BIOS from "Optimized" to "Ultra performance" (you get louder fans, but the full processor speed). It usually also works well with the "Cool" and "Quiet" profiles. The main problem seems to be the default "Optimized" profile.
Obs: I am using Pop-OS 21.10, kernel 5.15.15 and self compiled 2.4.8 thermald.
I think I found a workaround, when I load via USB PD I don't have the problem.
I confirm on the normal charge the problems mentioned.
What exactly do you mean with Charging via USB or "normal"? My 7320 has only USB-C for charging.
What exactly do you mean with Charging via USB or "normal"? My 7320 has only USB-C for charging.
Which model do you have? My 7320 has no "normal" power plug.
I suspect he means charging by the usb-c wall charger vs power delivery like from a dock device
This is something I have already checked but did not solve the problem for me.
This is something I have already checked but did not solve the problem for me.
Dell dock or other brand?
I've not tested this myself yet but I've a Dell TB dock due to arrive soon so I'll feedback
Both with original AC adapter and with my connected 90w Dell monitor. I also checked BIOS, both are recognized correctly.
Looks like Dell completely doesn't care about Linux/Open source...
5.16.10-1.el8.elrepo.x86_64 it still exists here as well. There's one other thing that just came to mind: the fan sensors don't work. In all of my previous Dell laptops, the system is able to read fan speed from lm_sensors. In this case the fan speed is not available. I wonder if the lack of fan speed data is causing thermald to make some assumptions that aren't correct.
Please rub https://github.com/intel/thermal_daemon/blob/master/test/thermal-debug-dump-fedora.sh or https://github.com/intel/thermal_daemon/blob/master/test/thermal-debug-dump-ubuntu.sh. I can check thermal tables first.
Here are the requested dumps. [23111824.tar.gz](https://github.com/intel/thermal_daemon/files/8126088/23111824.tar.gz)
@JoshuaPK for which model are these? I could provide Latitude 7320 with openSuse if needed.
Here you have the file for a 5421 with i5-11500H. The clock decrease to 800 MHz even with the laptop sitting on top a Zalman laptop base with a fan 23173225.tar.gz .
@sebastianha my apologies. I have a Latitude 5420 with an i5-1145G7. I have seen a number of scenarios, but the most frequent are throttling down to 1.5GHz and throttling down to 400MHz. In my case the throttling goes away when the load decreases. So, for example, if I try to compress an ISO file with 7za it will throttle, then if I stop 7za it will jump back up to 4GHz. This is thermald 2.4.8 that I compiled from source, running on Rocky Linux 8.5.
The guaranteed frequency is 1500 MHz on this system. With 100% load system was able to run about 1000 MHz above guaranteed for 80% of the time. What is the expectation? System can't sustain turbo forever.
You can try to manually adjust power and try if you can prevent system for reaching peak turbo and keep above 1500Hz without thermal throttle for test: Try
Reboot echo 28000000 > /sys/devices/virtual/powercap/intel-rapl-mmio/intel-rapl-mmio:0/constraint_0_power_limit_uw echo 28000000 > /sys/devices/virtual/powercap/intel-rapl/intel-rapl:0/constraint_0_power_limit_uw You can run turbostat turbostat --show Core,CPU,Busy%,Bzy_MHz,TSC_MHz -o turbostat.out and check frequencies
@pjssilva Somehow there is no turbostat output in the tar file. Can you see if you have turbostat on your system?
@spandruvada Please could you explain more about 'The guaranteed frequency is 1500 MHz on this system'? For Linux is 1500MHz and for Windows - 2600 MHz, am I correct? Base frequency for Intel Core i5-1145G7 - 2600 MHz. I not understand how it works. So big difference for different OS.
@pjssilva Can you build yourself with the attached? 0001-Test-patch-to-fail-to-run-adaptive.zip
unzip and git am 0001-Test-patch-to-fail-to-run-adaptive.patch then build using make (procedure in README.txt)
Then systemctl stop thermald thermald --loglevel=debug --no-daemon --adaptive Attach the log. Run some workload also at the same time.
[@VitaliiSerdiuk The guaranteed doesn't change with Linux or Windows. You have a power budget, either you apply more initially and get more short term perf or use it moderately to not drop to 1500 MHz. Linux can't match every conditions in the thermal table as Windows, so Windows may be doing better, but can't say without actually running similar workload. You are running a web/video workload which Windows can manage better based on usage of HW acceleration. So their power usage profile will be different so you may not be dropping to 1500MHz.
@spandruvada I have installed turbostat and reran the test from the thermal-debug-dump-ubuntu.sh (I run Pop-OS, Ubuntu is the closest). 23223022.tar.gz I will now try to apply the patch you asked in the other message above and will report the results.
@spandruvada Now the therlmald log after applying the 0001-Test-patch-to-fail-to-run-adaptive.zip above. Before getting the log I left the system running stress-ng for 10 minutes or so. During the test the system was running for some time at 3600-3700 MHz and then dropping fast to 800 MHz for some time and restarting this cycle. The expected sustained clock for this processor is at least 2400 MHz, so 800 MHz is clearly a problem.
This may seem like a silly question, but are some platforms safe to run without thermald? Using 7za to create an archive of a ~60gb directory. With thermald running, the system is throttled down to 1.5GHz and stays there, with the core temperature staying steady at around 135F. If I kill thermald, then the system fluctuates between 2.4 and 3.1GHz (and does not remain below 2.4GHz for any length of time) and the core temperature fluctuates around 140-150F. I also threw in 8 threads of stress-ng and a YouTube video and the processor still fluctuated around 2.4 but never dipped below 2 for any length of time (temps remained the same around 140). According to spec the maximum core temperature is 212F. It appears that in this case the processor is doing a good job of regulating itself. What am I missing?
@spandruvada I understand the case with the guaranteed frequency but the problem is, that on my system the fan is not spinning at 100% but the system is throttled down.
In my understanding throttling should only happen when all thermal regulators are maxed out, this means: fan 100%, CPU temperature at 95°C.
Under full load my system is running the fan at ~25-50% and temperature is at 50°C. There is definitely room for more power.
I will deliver the outputs of the scripts as soon I have some spare time left.
@https://github.com/pjssilva Still the same problem. I need to recheck what is in the table which is preventing this to load. I may generate another patch.
@JoshuaPK The processor has in built in control. But this is about other parts in the system and skin temperature is under spec. But manufacturer may already have made sure out of box. Also the system may already have all the power table configured correctly without thermald. So disable thermald and cold reboot and check, if you still get good performance and decide.
@sebastianha I am not sure if there is any fan control available. #cat /sys/class/thermal/cooling_device*/type, Do you see any other names other than Processor, LCD,intel_powerclamp? Something like "Fan" or "TFN"
No:
~> cat /sys/class/thermal/cooling_device*/type
Processor
Processor
Processor
Processor
Processor
Processor
Processor
Processor
intel_powerclamp
TCC Offset
So unfortunately we can' t control fans. Do you see same behavior as the plot I attached above?
No, I see something like this (manually plotted data):
For a second I get full speed and a high temperature, then it instantly drops to ~2GHz and settles down to 1800MHz after some time. Temperature is always ~55°C and the fan did not kick in at all.
Update: What I noticed that the fan immediately kicks in when GPU or SSD is under load.
I also tested with the 0001-Patch: thermald-patch0001.txt
@pjssilva I did some silly mistake. Can you retry with this patch 0001-Test-patch-to-fail-to-run-adaptive-ver-2.zip
0001-Test-patch-to-fail-to-run-adaptive-ver-2.log
100% load, 50-55°C, no fan, 1800MHz fix the whole time.
@spandruvada It looks like you have something there! I ran the stress test for 15 minutes and the processor never dipped below 2.9GHz (which is the case clock of the Core i5-11500H configured with high TDP). Take a look at the log below. I will try some other tests, but it looks promising! thermald.log
Obs: The dip at the end of the test is because I stopped stress-ng.
@sebastianha, you have some other issue. First address issues who get stuck at 800MHz Please run https://github.com/intel/thermal_daemon/blob/master/test/thermal-debug-dump-ubuntu.sh or Fedora one there and attach the outputs
@https://github.com/pjssilva What is the make and model of your system?
@spandruvada My system is a Dell Latitude 5421, with a Core i5-11500H, 32 GB of RAM, and an Nvidia MX 450 graphics card running in hybrid mode.
Please rub https://github.com/intel/thermal_daemon/blob/master/test/thermal-debug-dump-fedora.sh or https://github.com/intel/thermal_daemon/blob/master/test/thermal-debug-dump-ubuntu.sh. I can check thermal tables first.
Dell Latitude 7320 i7-1185G7 on Ubuntu 20.04.3 running Kernel 5.14.0-1024-oem
@PhilipGB I don't see any throttling done by thermald. Can you try this procedure as a root user
If that doesn't address, the next step is also write in addition to above echo 54000000 > /sys/devices/virtual/powercap/intel-rapl-mmio/intel-rapl-mmio\:0/constraint_1_power_limit_uw
Same behaviour. Very briefly clocks to 4.3ghz then settles on 1.8ghz
I have observed that if I disable the thermald service then reboot the clock speed will float around 2.3ghz with intermittent drops to 400mhz under load
But if it's booted enabled and stopped or has been run and then stopped then even while thermald isn't running the system behaves the same as if it was, settling on 1.8ghz
Why are you using cpufreq performance. Try to use powersave, it will not hurt performance in most of the cases.
Also check cat /sys/bus/pci/devices/0000\:00\:04.0/tcc_offset_degree_celsius If this is high number write something like "5"
Countinue https://github.com/intel/thermal_daemon/issues/293# as it wasn't properly fixed.
When using google meet video conference + another browser search CPU throttled to 1500 MHz Latitude 5420 BIOS 1.14.1 Ubuntu - 20.04 Kernel - 5.15.13