Closed majanes-intel closed 2 years ago
I neglected to mention: running an identical workload on windows completes with no degradation in power. The system gets much warmer and the fans run at a clearly higher speed. Based on this observation, it seems clear that the system is not being limited by some physical thermal problem.
On Fri, 2021-03-05 at 12:34 -0800, Mark Janes Intel wrote:
Kernel: 5.11.3 Debian: Testing thermald: 2.4.3 (debian unstable) processor: i7-1185G7 -- 28 W TDP
After running power-intensive workloads for a short amount of time, the CPU and/or GPU will be throttled down drastically to ~10% of peak.
Running turbostat reveals that the peak current is ~16W, far below the TDP limit.
Running lm-sensors shows that the peak temp is ~50C, far below the limit.
After reading #291 and #280, I enabled debug logs for thermald. thermald.log
@spandruvada let me know if more information is needed. I can also bring the system to you in JF1. Mesa team will be using this laptop model for perf analysis.
If this is the complete log, then as you observed that non of the temperature triggered any throttling.
Bring the system, we can take a look.
Thanks.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
On Fri, 2021-03-05 at 12:37 -0800, Mark Janes Intel wrote:
I neglected to mention: running an identical workload on windows completes with no degradation in power. The system gets much warmer and the fans run at a clearly higher speed. Based on this observation, it seems clear that the system is not being limited by some physical thermal problem. Probably some stetting, which Windows is aware of it. We can't compare with Windows as we don't have support of several conditions in the table on this system, so using best effort. Particularly power slider and probably fan control stuff.
So we need to find what else we can do with these limitations.
Thanks.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
I see different behavior setting on 2.4.3 version from this repository and version in debian. The power limit is not getting set in Debian version. So is Debian back-porting patches? If that is the case they should have different private version. Who can help here? @ColinIanKing
I'll sort that out first thing Tuesday.
@spandruvada thanks for the work to figure out why this system was turning the gpu down to 100mhz. Your test branch improves the situation substantially, although it looks like there is still a long way to go. Running longer benchmarks, I can see that the CoreTmp climbs all the way to 72 degrees, with the GFXAMHz stable at the 1350MHz maximum. Tthe PkgWatt is around 25W, near the TDP limit. After that, power is cut to the system, limiting the GPU to 400MHz. The temp declines steadily, with the PkgWatt at 12W. For a short duration, the GFXAMHz oscillates between 400 and 1000, then stays at 400MHz. The temp declines to 40 degrees by the time the benchmark is done.
lm-sensors reports that the package max temp is 100 degrees Celsius. Is that accurate/realistic? If so, then it seems like thermald should wait longer before cutting power. If not, then it seems like thermald could settle on a much higher current for the package... at 12W, the package temp declines below what is necessary and performance suffers.
I used unigine heaven for this data point. I took a look ath the Thermal Analysis Tool on Windows, but I couldn't see how to get similar data from that platform. If you can give me some pointers, I should be able to at least understand what frequency/power levels windows achieves, and what the stable max temp is.
When I booted to the windows partition, I noticed that updates were running in the background, which can perturb performance measurements. I let the system complete a full software update, which updated the firmware on the device. With the firmware update, I now get a stable 1000MHz GPU clock, with the package temp stable at 50 degrees Celsius. While this is much closer to optimal, It still seems to me that the package could target a higher package temperature.
So, the power slider condition we could support (either using a default value or pulling a value from p-p-d
).
However, that would only help if we can resolve the \_SB_.PCI0.LPCB.ECDV.NGFF
sensor. And, even if we do that, the OEM conditions coming through ACPI might still not have a sane value to proceed.
I've met the same issue. @benzea does thermald 2.4.4 resolve the issue?
I've built 2.4.4 for Fedora 34, installed it, and now have almost constant 1700Mhz instead of 400Mhz. That's fine, but my CPU temp is still too low (~54C), so I am sure that the CPU can gain a higher clock speed. Is it possible somehow?
If it will help - that's a log from journalctl -r
for thermald 2.4.4 on Fedora 34, which is launched with options --systemd --dbus-enable --adaptive
:
мая 02 06:05:24 localhost.localdomain thermald[3010]: ppcc limits is less than def PL1 max power :28000000 check thermal-conf.xml.auto
мая 02 06:05:24 localhost.localdomain thermald[3010]: sensor id 10 : No temp sysfs for reading raw temp
мая 02 06:05:24 localhost.localdomain thermald[3010]: sensor id 10 : No temp sysfs for reading raw temp
мая 02 06:05:24 localhost.localdomain thermald[3010]: sensor id 10 : No temp sysfs for reading raw temp
мая 02 06:05:23 localhost.localdomain thermald[3010]: Polling mode is enabled: 4
мая 02 06:05:23 localhost.localdomain thermald[3010]: 27 CPUID levels; family:model:stepping 0x6:8c:1 (6:140:1)
мая 02 06:05:23 localhost.localdomain thermald[3010]: Unable to find a sensor for \_SB_.PCI0.LPCB.ECDV.NGFF
мая 02 06:05:23 localhost.localdomain thermald[3010]: Unable to find a sensor for \_SB_.PCI0.LPCB.ECDV.NGFF
мая 02 06:05:23 localhost.localdomain thermald[3010]: Unable to find a sensor for \_SB_.PCI0.LPCB.ECDV.NGFF
мая 02 06:05:23 localhost.localdomain thermald[3010]: Unable to find a sensor for \_SB_.PCI0.LPCB.ECDV.NGFF
мая 02 06:05:23 localhost.localdomain thermald[3010]: Unable to find a sensor for \_SB_.PCI0.LPCB.ECDV.NGFF
мая 02 06:05:23 localhost.localdomain thermald[3010]: Unable to find a sensor for \_SB_.PCI0.LPCB.ECDV.NGFF
мая 02 06:05:23 localhost.localdomain thermald[3010]: Unable to find a sensor for \_SB_.PCI0.LPCB.ECDV.NGFF
мая 02 06:05:23 localhost.localdomain thermald[3010]: Unable to find a sensor for \_SB_.PCI0.LPCB.ECDV.NGFF
мая 02 06:05:23 localhost.localdomain thermald[3010]: Unable to find a sensor for \_SB_.PCI0.LPCB.ECDV.NGFF
мая 02 06:05:23 localhost.localdomain thermald[3010]: Unable to find a sensor for \_SB_.PCI0.LPCB.ECDV.NGFF
мая 02 06:05:23 localhost.localdomain thermald[3010]: Unable to find a sensor for \_SB_.PCI0.LPCB.ECDV.NGFF
мая 02 06:05:23 localhost.localdomain thermald[3010]: Unsupported conditions are present
мая 02 06:05:23 localhost.localdomain thermald[3010]: Unsupported condition 57 (UKNKNOWN)
мая 02 06:05:23 localhost.localdomain thermald[3010]: Unsupported condition 57 (UKNKNOWN)
мая 02 06:05:23 localhost.localdomain thermald[3010]: Unsupported condition 57 (UKNKNOWN)
мая 02 06:05:23 localhost.localdomain thermald[3010]: Unsupported condition 57 (UKNKNOWN)
мая 02 06:05:23 localhost.localdomain thermald[3010]: Unsupported condition 57 (UKNKNOWN)
мая 02 06:05:23 localhost.localdomain thermald[3010]: Unsupported condition 57 (UKNKNOWN)
мая 02 06:05:23 localhost.localdomain thermald[3010]: Unsupported condition 57 (UKNKNOWN)
мая 02 06:05:23 localhost.localdomain thermald[3010]: Unsupported condition 57 (UKNKNOWN)
мая 02 06:05:23 localhost.localdomain thermald[3010]: Unsupported condition 57 (UKNKNOWN)
мая 02 06:05:23 localhost.localdomain thermald[3010]: Unsupported condition 57 (UKNKNOWN)
мая 02 06:05:16 localhost.localdomain thermald[3010]: 27 CPUID levels; family:model:stepping 0x6:8c:1 (6:140:1)
мая 02 06:05:16 localhost.localdomain systemd[1]: Started Thermal Daemon Service.
Laptop: Dell Latitude 5420 with 11th Gen Intel(R) Core(TM) i7-1165G7 CPU.
I've met the same issue. @benzea does thermald 2.4.4 resolve the issue?
I've built 2.4.4 for Fedora 34, installed it, and now have almost constant 1700Mhz instead of 400Mhz. That's fine, but my CPU temp is still too low (~54C), so I am sure that the CPU can gain a higher clock speed. Is it possible somehow?
Oh, a newer thermald for Fedora would help?
Sorry about that. I thought I had picked up the important patches downstream already (even if I had an older version). I can update the package so that others benefit from that.
Yeah, 2.4.4 helps somehow on Fedora but not completely resolve the issue. So without thermald 2.4.4 (with older thermald version or without it) is still downclocked to 400Mhz after ~30 secs. With thermald 2.4.4 the highest clock is 1700 Mhz. Would be awesome if you'll build thermald 2.4.4 for Fedora :)
It's still too low since the usual clock for the CPU is 2800Mhz. And I have no idea how it can be fixed :(
@benzea any news about Fedora updates?
@benzea any news about Fedora updates?
On its way now.
I am not familiar with modern Linux CPU scheduling but I think the real root of the issue is some bugs in intel_pstate
implementation in Linux kernel. Because on Windows I can gain stable 2.8 Ghz CPU clock on the same hardware. On Linux (Fedora 34) without thermald I can get only 400 Mhz and with thermald - 1.7 GHz.
Maybe anyone from thermald team can provide more information. I will try to test another Dell Latitude 5420. Also in a few days I'll test Dell Latitude 5410 (hope it'll work better).
By the way - with modern Intel CPUs is using Thermald necessary or not?
I am not familiar with modern Linux CPU scheduling but I think the real root of the issue is some bugs in
intel_pstate
implementation in Linux kernel. Because on Windows I can gain stable 2.8 Ghz CPU clock on the same hardware. On Linux (Fedora 34) without thermald I can get only 400 Mhz and with thermald - 1.7 GHz.
Please don't jump to such conclusions. The problem is that we need to do thermal management in userspace. To do so, we need to parse data from ACPI which we are not fully implementing because Intel is not publishing the specification. And, on top of that, there may also be vendor specific things.
i.e. probably мая 02 06:05:23 localhost.localdomain thermald[3010]: Unsupported condition 57 (UKNKNOWN)
is the issue. If you figure out whwat that condition means, then one might implement it and it will likely help you.
Maybe anyone from thermald team can provide more information. I will try to test another Dell Latitude 5420. Also in a few days I'll test Dell Latitude 5410 (hope it'll work better).
By the way - with modern Intel CPUs is using Thermald necessary or not?
Yes.
@benzea Thanks! Can you please describe to me a little bit more, what is the real difference in thermal management between the intel_pstate
subsystem and thermald
? Or just provide a link, where I can read about it. Thanks in advance!
If you figure out what that condition means, then one might implement it and it will likely help you.
Do you have any suggestions, how can I debug it? Maybe there is some already existing guide for it. I am ready to invest some time into it and assist you as much as I can.
@benzea Thanks! Can you please describe to me a little bit more, what is the real difference in thermal management between the
intel_pstate
subsystem andthermald
? Or just provide a link, where I can read about it. Thanks in advance!If you figure out what that condition means, then one might implement it and it will likely help you.
Do you have any suggestions, how can I debug it? Maybe there is some already existing guide for it. I am ready to invest some time into it and assist you as much as I can.
Not really. You can enable debug logging for thermald and it'll dump more detailed information. It might be possible to guess what the condition is based on by looking at the values and the various limits that are being applied.
At the end, if we can just emulate a sane default value, we might not even need to know the exact meaning. For power-slider we just assume a "balanced" performance right now for example.
I pushed another change to fix the performance gap once you update BIOS on this system.
Absolutelly the same issue with Latitude 7520
If the issue is same on 7520, does the latest thermald fix the issue?
The same as for @ZaMaZaN4iK
[root@dell tmp]# thermald --version
2.4.6
With the latest version CPU stuck on 1800mhz, without thermald -- 400mhz
Since now I have Dell Latitude 5410 - I cannot test the latest thermald on 5420. I'll try to test the latest thermald on the 5410. I hope @benzea ported latest changes to the Fedora version.
Since now I have Dell Latitude 5410 - I cannot test the latest thermald on 5420. I'll try to test the latest thermald on the 5410. I hope @benzea ported latest changes to the Fedora version.
Fedora 34 and 35 both have thermald 2.4.6 currently.
I've attached debug log with the latest(2.4.6) version of thermald. Not sure if it's helpful
In the log we can see dropping frequency to 1800mhz(temp down to 55 from 73) after a few seconds of stress -c 8
thermald --no-daemon --loglevel=debug --dbus-enable > /tmp/thermald.log
Is this log with --adaptive option?
On Tue, 2021-06-29 at 04:56 -0700, Dmitry Rubtsov wrote:
I've attached debug log with the latest(2.4.6) version of thermald. Not sure if it's helpful In the log we can see dropping frequency to 1800mhz(temp down to 55 from 73) after a few seconds of stress -c 8 thermald.log — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
no, I attached a new with adaptive option
The concern is that there are no sensors:
RN]Unable to find a zone for TSKN [1624968504][WARN]Unable to find a zone for NGFF [1624968504][WARN]Unable to find a zone for TMEM [1624968504][WARN]Unable to find a zone for TMEM [1624968504][WARN]Unable to find a zone for TMEM [1624968504][WARN]Unable to find a zone for TMEM [1624968504][WARN]Unable to find a zone for TSSD [1624968504][DEBUG]check trip zone:0:0
What is the kernel version? Check /sys/class/thermal/thermal_zone*/type if these sensors exist.
On Tue, 2021-06-29 at 05:10 -0700, Dmitry Rubtsov wrote:
no, I attached a new with adaptive option thermald-adaptive.log — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
[root@dell ~]# cat /sys/class/thermal/thermal_zone*/type
INT3400 Thermal
TCPU
iwlwifi_1
x86_pkg_temp
[root@dell ~]# uname -a
Linux dell 5.12.13-arch1-2 #1 SMP PREEMPT Fri, 25 Jun 2021 22:56:51 +0000 x86_64 GNU/Linux
May be try to update to the latest BIOS. This doesn't show sensors described in the thermal configuration.
Do you see driver loaded lsmod | grep -i int3
What is the output of ls /sys/bus/platform/devices/
Thanks.
On Tue, 2021-06-29 at 05:35 -0700, Dmitry Rubtsov wrote:
@. ~]# cat /sys/class/thermal/thermal_zone/type INT3400 Thermal TCPU iwlwifi_1 x86_pkg_temp **@.*** ~]# uname -a Linux dell 5.12.13-arch1-2 #1 SMP PREEMPT Fri, 25 Jun 2021 22:56:51 +0000 x86_64 GNU/Linux — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
Thanks for your reply
May be try to update to the latest BIOS.
BIOS updated to the latest version(1.7.1)
Do you see driver loaded
dell ~ » lsmod | grep -i int3 int340x_thermal_zone 20480 1 processor_thermal_device int3400_thermal 20480 0 acpi_thermal_rel 16384 1 int3400_thermal
ls /sys/bus/platform/devices/
dell ~ » ls /sys/bus/platform/devices/ ACPI0003:00 efivars.0 HID-SENSOR-2000e1.12.auto HID-SENSOR-2000e1.22.auto HID-SENSOR-2000e1.6.auto INT33A1:00 intel_rapl_msr.0 PNP0C0E:00 regulatory.0 ACPI000C:00 HID-SENSOR-200001.18.auto HID-SENSOR-2000e1.13.auto HID-SENSOR-2000e1.23.auto HID-SENSOR-2000e1.7.auto INT33D2:00 iTCO_wdt PNP0C14:00 rtc-efi.0 ACPI000E:00 HID-SENSOR-200001.1.auto HID-SENSOR-2000e1.14.auto HID-SENSOR-2000e1.24.auto HID-SENSOR-2000e1.8.auto INT33D3:00 microcode PNP0C14:01 rtsx_pci_sdmmc.0 alarmtimer.0.auto HID-SENSOR-200001.27.auto HID-SENSOR-2000e1.15.auto HID-SENSOR-2000e1.25.auto HID-SENSOR-INT-020b INT34C5:00 pcspkr PNP0C14:02 serial8250 coretemp.0 HID-SENSOR-200001.9.auto HID-SENSOR-2000e1.16.auto HID-SENSOR-2000e1.26.auto i2c_designware.0 INTC1040:00 PNP0103:00 PNP0C14:03 snd-soc-dummy dcdbas HID-SENSOR-200041.10.auto HID-SENSOR-2000e1.17.auto HID-SENSOR-2000e1.2.auto i2c_designware.1 INTC1043:00 PNP0C09:00 PNP0C14:04 STM0125:00 dell-laptop HID-SENSOR-200073.28.auto HID-SENSOR-2000e1.19.auto HID-SENSOR-2000e1.3.auto i8042 INTC1043:01 PNP0C0A:00 PNP0C14:05 USBC000:00 dell-smbios.0 HID-SENSOR-200076.29.auto HID-SENSOR-2000e1.20.auto HID-SENSOR-2000e1.4.auto idma64.0 INTC1043:02 PNP0C0C:00 PNP0C14:06 efi-framebuffer.0 HID-SENSOR-2000e1.11.auto HID-SENSOR-2000e1.21.auto HID-SENSOR-2000e1.5.auto idma64.1 INTC1051:00 PNP0C0D:00 reg-dummy
I found that for unknown reason module int3403_thermal
was blacklisted on my laptop(I think it is an old artifact), I've removed this entry from /etc/modprobe.d
and rebooted.
Now my lsmod looks like this:
dell ~ » lsmod | grep -i int3
int3403_thermal 20480 0
int340x_thermal_zone 20480 2 int3403_thermal,processor_thermal_device
int3400_thermal 20480 0
acpi_thermal_rel 16384 1 int3400_thermal
But problem persist, after a few seconds of cpu load frequency locked on 1800mhz.
Here I attached new log from thermald: thermald-adaptive.log
Now better. If you have Windows, compare with that.
On Tue, 2021-06-29 at 13:48 -0700, Dmitry Rubtsov wrote:
I found that for unknown reason module int3403_thermal was blacklisted on my laptop(think it is an old artifact), I've removed this entry from /etc/modprobe.d and rebooted. Now my lsmod looks like this: dell ~ » lsmod | grep -i int3
int3403_thermal 20480 0 int340x_thermal_zone 20480 2 int3403_thermal,processor_thermal_device int3400_thermal 20480 0 acpi_thermal_rel 16384 1 int3400_thermal But problem persist, after a few seconds of cpu load frequency locked on 1800mhz. Here I attached new log from thermald: thermald-adaptive.log — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
sorry, I don't have it,
dell support assures me that it is not normal, and laptop should throttle only on 100C
here is screenshot from ThermalMonitor(stress -c 8
):
This is not about temperature, but power limits. Your temperature can still be lower but power limits may have been reached.
Does your log attached before, covers the full scenario, from startup to when you get throttled to 1800MHz?
On Tue, 2021-06-29 at 14:01 -0700, Dmitry Rubtsov wrote:
sorry, I don't have it, support assures me that it is not normal, and laptop should throttle only on 100C here is screenshot from ThermalMonitor(stress -c 8):
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
The log is only 21s long. And, to me it looks like RAPL is initialy set to 22.1W (maybe from the BIOS?) and it is increased in 0.1W steps every few seconds. So, if you wait longer, then the speed may very well increase.
This is not about temperature, but power limits. Your temperature can still be lower but power limits may have been reached.
Yes, but for unknown reason. Without thermald
it will be throttled to 400mhz, so I think that thermald
can affect it. Absolutelly the same as for @ZaMaZaN4iK, but for me it stuck on 1800mhz, not 1700:
Yeah, 2.4.4 helps somehow on Fedora but not completely resolve the issue. So without thermald 2.4.4 (with older thermald version or without it) is still downclocked to 400Mhz after ~30 secs. With thermald 2.4.4 the highest clock is 1700 Mhz.
Does your log attached before, covers the full scenario, from startup to when you get throttled to 1800MHz?
No,
1) I started thermald
manually:
thermald --no-daemon --loglevel=debug --dbus-enable --adaptive > /tmp/thermald-adaptive.log
2) In another terminal session I started stress
stress -c 8
3) Waited a few seconds and see that frequency dropped to 1800mhz
4) Stopped thermald
and send log here
if you wait longer, then the speed may very well increase.
On huge cpu load frequency locked on 1800mhz and not increasing anymore.
What @benzea is saying that First start thermald. Wait for couple of minutes (The power level will reach max). Then do the stress -c 8 test. Also what is? cat /sys/devices/system/cpu/cpu0/cpufreq/base_frequency
Eventually with stress -C 8 you will reach this frequency,
Also what is?
cat /sys/devices/system/cpu/cpu0/cpufreq/base_frequency 1800000
Interesting thing, but according to link it should be 3ghz. Do you know why is it? I see that on another laptop with i5 11gen CPU it shows correct frequency
What @benzea is saying that First start thermald. Wait for couple of minutes (The power level will reach max). Then do the stress -c 8 test.
Sure, I can do it a bit later, I will provide result. But it will be the same based on my experience with this laptop and frequency will be locked on 1800mhz
This values is from the HW, so this is what it is configured for.
Sure, but in windows it shows correct value and can handle higher frequency. So it's problem not with thermald
? Do you have any another guess? As we can see, not only me have such a problem.
Thanks a lot for your support
I downloaded Windows 10 and tried to reproduce the problem.
Windows shows base frequency as 1800mhz(I'm wrong before^ looks like it depends on the laptop vendor's settings). I tested CPU with stress test and see that it can handle 3300mhz without any problems for a long time.
Now I rebooted into linux and see that CPU still stuck on 1800mhz(I googled that it is 15W). So I'm sure that problem with the some linux component, but I don't know which one.
What changes was made in the last version of thermald
, this version fixes the issue partially, there can't be some related problem?
Run this and attach the tar file. https://github.com/intel/thermal_daemon/blob/master/test/thermal-debug-dump-fedora.sh
or for Ubuntu https://github.com/intel/thermal_daemon/blob/master/test/thermal-debug-dump-ubuntu.sh
On Wed, 2021-06-30 at 14:09 -0700, Dmitry Rubtsov wrote:
I downloaded Windows 10 and tried to reproduce the problem. Windows shows base frequency as 1800mhz(I'm wrong before^ looks like it depends on the laptop vendor's settings). I tested CPU with stress test and see that it can handle 3300mhz without any problems for a long time. Now I rebooted into linux and see that CPU still stuck on 1800mhz(I googled that it is 15W). So I'm sure that problem with the some linux component, but I don't know which one. What changes was made in the last version of thermald, this version fixes the issue partially, there can't be some related problem? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
Sure, please check: 01141200.tar.gz
I've changed bz2 to gz, because github doesn't allow bz2 uploads
Also I fixed a small typo in the script: https://github.com/intel/thermal_daemon/pull/308
On 01/07/2021 09:27, Dmitry Rubtsov wrote:
Sure, please check: 01141200.tar.gz https://github.com/intel/thermal_daemon/files/6746639/01141200.tar.gz
I've changed bz2 to gz, because github doesn't allow bz2 uploads
Also I fixed a small typo in the script:
- stress-ng ---cpu 16
- stress-ng --cpu 16
if you use stress-ng --cpu -1 then stress-ng will automatically allocate 1 stressor per CPU
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/intel/thermal_daemon/issues/293#issuecomment-872039461, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACHCRL7AMBI273KRIQPKFP3TVQRHDANCNFSM4YV3FMVQ.
if you use stress-ng --cpu -1 then stress-ng will automatically allocate 1 stressor per CPU
I just removed extra hyphen, please check the script
dell ~/throttle-debug » stress-ng ---cpu 16
stress-ng: unrecognized option '---cpu'
Try 'stress-ng --help' for more information.
@ColinIanKing I suggested your recommendation in https://github.com/intel/thermal_daemon/pull/308 pull request, thanks
I tried with 5.13.0 mainline kernel — nothing has changed
The problem is that TMEM sensor reaches its limits of 42C in 4 seconds,, so the system is throttled from max power. Even at the start the temperature is 39C. So not much margin. Not sure what can be done here,
Do you have any idea why in windows it working properly? Is it possible to ignore this sensor?
Kernel: 5.11.3 Debian: Testing thermald: 2.4.3 (debian unstable) processor: i7-1185G7 -- 28 W TDP
After running power-intensive workloads for a short amount of time, the CPU and/or GPU will be throttled down drastically to ~10% of peak.
Running turbostat reveals that the peak current is ~16W, far below the TDP limit.
Running lm-sensors shows that the peak temp is ~50C, far below the limit.
After reading #291 and #280, I enabled debug logs for thermald. thermald.log
@spandruvada let me know if more information is needed. I can also bring the system to you in JF1. Mesa team will be using this laptop model for perf analysis.