intel / thermal_daemon

Thermal daemon for IA
GNU General Public License v2.0
550 stars 118 forks source link

thermal shutdown on Lenovo Ideapad 5 15ITL05 #328

Closed ghost closed 2 years ago

ghost commented 2 years ago

I get thermal shutdowns on my laptop. I stress it with freac (music file transcoding) and it gets to 100 celsius rather quick and doesn't do much to remediate that, and thus ends up shutting down rather quickly. I'm on Fedora 35, kernel 5.15.12-200.fc35.x86_64 Right now, in order to work around this, I've forced (by using cpupower) the cpu clock to 3.5 GHz and it's all nice, 80-87 celsius.

Current sensor readings: coretemp-isa-0000 Adapter: ISA adapter Package id 0: +88.0°C (high = +100.0°C, crit = +100.0°C) Core 0: +86.0°C (high = +100.0°C, crit = +100.0°C) Core 1: +82.0°C (high = +100.0°C, crit = +100.0°C) Core 2: +84.0°C (high = +100.0°C, crit = +100.0°C) Core 3: +82.0°C (high = +100.0°C, crit = +100.0°C)

nvme-pci-e100 Adapter: PCI adapter Composite: +41.9°C (low = -0.1°C, high = +86.8°C) (crit = +89.8°C) Sensor 1: +41.9°C (low = -273.1°C, high = +65261.8°C)

iwlwifi_1-virtual-0 Adapter: Virtual device temp1: +62.0°C

BAT0-acpi-0 Adapter: ACPI interface in0: 17.00 V

spandruvada commented 2 years ago

What Generation of CPU is this? Try the script https://github.com/intel/thermal_daemon/blob/master/test/thermal-debug-dump-fedora.sh And attach logs

ghost commented 2 years ago

11th gen, Intel Tigerlake, i7 1165g7 Here's the file that the script made. It has logs inside. 06230641.tar.gz

spandruvada commented 2 years ago

1. Logs suggest that your adaptive option failed before. I think you have a file "/tmp/ignore_adaptive". Please delete that rerun the script. 2. Also this is a Lenovo system which probably has FW thermal control, which causes thermald to exit in normal conditions. Do you have a file called "/sys/devices/platform/thinkpad_acpi/dytc_lapmode"? 3. I think you can prevent shutdown by writing TCC offset. You can "cd" to /sys/class/thermal/cooling_device*/ where type attribute is "TCC" and "echo 5 > cur_state"

ghost commented 2 years ago
  1. My laptop seems to now shut down even when running that script, ran it 3 times, can't fnish it because it gets too hot and shuts down. Did remove /tmp/ignore_adaptive all the times i ran it though, these are the files that were written before the shutdown. 07164959.zip
    1. It's an ideapad, nto a thinkpad. The thinkpad_acpi folder doesnt exist, and I couldn't find any file named "dytc"
    2. I guess that's a good workaround idea
ghost commented 2 years ago

was able to go through the script! here's the files 07172300.tar.gz

spandruvada commented 2 years ago

There is some mess up somewhere. I see [1641594278][WARN]Unable to find a zone for SEN3 [1641594278][WARN]Unable to find a zone for SEN1 They are present before. So need to provide some debug patch.

ghost commented 2 years ago

i'll be waiting

spandruvada commented 2 years ago

I created one change, please apply and build. This is two lines change only. You can $git clone https://github.com/intel/thermal_daemon.git $git checkout remotes/origin/ideapad-11thgen -b ideapad-11thgen

Then follow build procedure in README.txt

After build: $ sudo rm /tmp/ignore_adaptive $thermald --loglevel=debug --no-daemon --adaptive And attach the logs.

You shouldn't see [1641594278][WARN]Unable to find a zone for SEN3 [1641594278][WARN]Unable to find a zone for SEN1

Try your tests. May be other sensors trip before hopefully it will avoid shutdown.

ghost commented 2 years ago

thermald-log2.txt

I changed the ./autogen.sh prefix=/ to ./autogen.sh prefix=/usr/local to avoid conflicts with the distro thermald

Anyway, I got that log, but I still get the "unable to find a zone" messages.... Didn't run tests because thermald exits.

spandruvada commented 2 years ago

Is it possible that you are not running changed version? I added one change to print info if this is effective to the same branch. Jut git pull and retry,

ghost commented 2 years ago

I always run the changed version because I run sudo /usr/local/sbin/thermald --loglevel=debug --no-daemon --adaptive > ../thermald-logX.txt Here's the latest log. It's 20 kilobytes larger. thermald-log3.txt

spandruvada commented 2 years ago

Added one more change to ignore some modes. Please try. Pushed to the same branch

ghost commented 2 years ago

Just ran it, doing the tests and it seems to be working! Doesn't exit, keeps running, /tmp/ignore_adaptive isn't created, temps are at 74C, CPU throttles when it should (right now the stress I described in the first post is running and the CPU speeds go down as they should. Temps dont seem to even reach 75C. CPU speed is in 2.9GHz - 3 GHz. Fans spinning. All nice. thermald-log4.txt

spandruvada commented 2 years ago

Great. I will create a formal change and let you know to give one final test. This is obviously an issue on this platform for adaptive.

ghost commented 2 years ago

gotcha!

spandruvada commented 2 years ago

I reverted the previous changes and add a change so that it will not cause some issues to some unknown platforms. Pushed a new change to the same branch. I think it should work the same. Just send me one log for a run of minute or so.

If all good I will merge the change to master branch. In this way Fedora can pick the change.

ghost commented 2 years ago

thermald-log5.txt there it is. Works greatly! Thank you so much

spandruvada commented 2 years ago

Thanks for reporting. Applied change version to v2.4.8

ghost commented 2 years ago

NICE! Thank you so MUCH!