intel / thermal_daemon

Thermal daemon for IA
GNU General Public License v2.0
540 stars 117 forks source link

thermald throttles power when int3403_thermal module is loaded #456

Closed ejgallego closed 2 weeks ago

ejgallego commented 1 month ago

Dear thermald developers,

first of all thanks for your help and support in developing thermald.

Similarly to some reported issues, thermald will throttle my laptop's power to significantly lower values than the specified factory limit (45W sustained)

This behavior was not present with the laptop's factory configuration, but appeared once the Ubuntu version was updated.

Removing the int3403_thermal and int3400_thermal kernel modules makes the issue disappear; this hint was found on several other issues in this repos, however the problem seems still to be present.

Debug log attached.

13220123.tar.gz

spandruvada commented 1 month ago

temperature

spandruvada commented 1 month ago

There is a sensor STG1, which starts at 74C, then stays around 78C You can see in above graph from your log. This stays always say high. The thermal tables calls to restrict power to 18W at 78C.

[1723579304][INFO] source:_SB.PCI0.B0D4 target:_SB.PCI0.LPCB.ECDV.STG1 priority:9 sample_period:10 temp:71 domain:9 controlknob:65536 psv.limit:25000 [1723579304][INFO] source:_SB.PCI0.B0D4 target:_SB_.PCI0.LPCB.ECDV.STG1 priority:9 sample_period:10 temp:76 domain:9 controlknob:65536 psv.limit:22000 [1723579304][INFO] source:_SB.PCI0.B0D4 target:_SB_.PCI0.LPCB.ECDV.STG1 priority:9 sample_period:10 temp:78 domain:9 controlknob:65536 psv.limit:18000 [1723579304][INFO] source:_SB.PCI0.B0D4 target:_SB_.PCI0.LPCB.ECDV.STG1 priority:9 sample_period:10 temp:80 domain:9 controlknob:65536 psv.limit:15000 [1723579304][INFO] source:_SB.PCI0.B0D4 target:_SB_.PCI0.LPCB.ECDV.STG1 priority:10 sample_period:10 temp:90 domain:9 control_knob:65536 psv.limit:MIN

So here thermald is doing what the platform is asking for. Instead of unloading modules, try cd /sys/class/thermal/thermal_zone10/

you can confirm that is correct folder by

cat type

You should see STG1.

There should be a attribute called emul_temp

echo 70000 > emul_temp

ejgallego commented 4 weeks ago

Hi @spandruvada , thank you so much for the quick response.

You should see STG1. echo 70000 > emul_temp

I tried this, but didn't help too much in the long term, I guess some other sensor made thermald trip?

Removing / loading the int3403_thermal module makes the following thermal zones appear:

SEN1
SEN2
SEN3
SEN4
SEN5
VIR1
VIR2
WRLS
STG1

so I guess some other sensor could be having this system trip?

Would a trace with emul_temp set to 70º for STG1 be helpful?

Is the emul_temp setting permanent, or does it need refresh?

spandruvada commented 4 weeks ago

This is not permanent on reboot. You can send the trace with 70C, it is possible that some other sensor. Did you try to clean your vents in your laptop?

spandruvada commented 2 weeks ago

This is caused by sensor trip. If you need further debug, you can reopen.

ejgallego commented 2 weeks ago

Hi @spandruvada ,

thanks for your help, here is a new log with the STG1 sensor set to 70 using emul_temp.

There is still something else tripping, I wonder if that's the coretemp sensor? Log attached.

I am not reopening as:

Thanks again for your help can congrats on the great work with thermald.

29141505.tar.gz