intel / thermal_daemon

Thermal daemon for IA
GNU General Public License v2.0
539 stars 117 forks source link

"no temp sysfs"; "stack smashing detected" on 2.5.4 on Ubuntu #425

Closed runderwo closed 1 month ago

runderwo commented 9 months ago
Nov 04 17:09:19 achpee2 systemd[1]: Starting thermald.service - Thermal Daemon Service...
Nov 04 17:09:19 achpee2 systemd[1]: Started thermald.service - Thermal Daemon Service.
Nov 04 17:09:19 achpee2 thermald[7611]: 22 CPUID levels; family:model:stepping 0x6:5e:3 (6:94:3)
Nov 04 17:09:19 achpee2 thermald[7611]: 22 CPUID levels; family:model:stepping 0x6:5e:3 (6:94:3)
Nov 04 17:09:19 achpee2 thermald[7611]: sensor id 11 : No temp sysfs for reading raw temp
Nov 04 17:09:19 achpee2 thermald[7611]: sensor id 11 : No temp sysfs for reading raw temp
Nov 04 17:09:19 achpee2 thermald[7611]: sensor id 11 : No temp sysfs for reading raw temp
Nov 04 17:09:19 achpee2 thermald[7611]: *** stack smashing detected ***: terminated
Nov 04 17:09:23 achpee2 systemd[1]: thermald.service: Main process exited, code=dumped, status=6/ABRT
Nov 04 17:09:23 achpee2 systemd[1]: thermald.service: Failed with result 'core-dump'.
Nov 04 17:09:24 achpee2 systemd[1]: thermald.service: Scheduled restart job, restart counter is at 39.
Nov 04 17:09:24 achpee2 systemd[1]: Stopped thermald.service - Thermal Daemon Service.

This loops endlessly in syslog. Have rebooted, deleted old files in /etc/thermald, nothing changes it. For this Skylake hardware, should it be removed?

shellclear commented 8 months ago

the same issue here...

Dec 08 09:42:28 NGC1976 thermald[8688]: sensor id 13 : No temp sysfs for reading raw temp Dec 08 09:42:28 NGC1976 thermald[8688]: sensor id 13 : No temp sysfs for reading raw temp Dec 08 09:42:28 NGC1976 thermald[8688]: sensor id 13 : No temp sysfs for reading raw temp Dec 08 09:42:28 NGC1976 thermald[8688]: *** stack smashing detected ***: terminated Dec 08 09:42:28 NGC1976 systemd[1]: thermald.service: Main process exited, code=killed, status=6/ABRT Dec 08 09:42:28 NGC1976 systemd[1]: thermald.service: Failed with result 'signal'. Dec 08 09:42:28 NGC1976 systemd[1]: thermald.service: Scheduled restart job, restart counter is at 5. Dec 08 09:42:28 NGC1976 systemd[1]: thermald.service: Start request repeated too quickly. Dec 08 09:42:28 NGC1976 systemd[1]: thermald.service: Failed with result 'signal'. Dec 08 09:42:28 NGC1976 systemd[1]: Failed to start thermald.service - Thermal Daemon Service.

I've tried the same steps that you @runderwo but no solution until now...

shellclear commented 8 months ago

The thermald man says:

dptfxtract
    Download from: https://github.com/intel/dptfxtract
    This generates configuration files for thermald on some systems.

The problem is that this repository is now obsolete... btw the dptfxtract package or command does not exist in the debian operating system

shellclear commented 8 months ago

I was able to fix it following the steps described bellow:

Output file file is /etc/thermald/thermal-conf.xml.auto


- sudo systemctl restart thermald
- sudo systemctl status thermald(to check if the service was correctly started)

● thermald.service - Thermal Daemon Service Loaded: loaded (/usr/lib/systemd/system/thermald.service; enabled; preset: enabled) Active: active (running) since Fri 2023-12-08 10:07:10 CET; 19s ago Main PID: 24339 (thermald) Tasks: 3 (limit: 38130) Memory: 3.1M (peak: 3.7M) CPU: 40ms CGroup: /system.slice/thermald.service └─24339 /usr/sbin/thermald --systemd --dbus-enable --adaptive

Dec 08 10:07:10 NGC1976 systemd[1]: Starting thermald.service - Thermal Daemon Service... Dec 08 10:07:10 NGC1976 systemd[1]: Started thermald.service - Thermal Daemon Service. Dec 08 10:07:10 NGC1976 thermald[24339]: 22 CPUID levels; family:model:stepping 0x6:8e:a (6:142:10) Dec 08 10:07:10 NGC1976 thermald[24339]: 22 CPUID levels; family:model:stepping 0x6:8e:a (6:142:10) Dec 08 10:07:10 NGC1976 thermald[24339]: sensor id 13 : No temp sysfs for reading raw temp Dec 08 10:07:10 NGC1976 thermald[24339]: sensor id 13 : No temp sysfs for reading raw temp Dec 08 10:07:10 NGC1976 thermald[24339]: sensor id 13 : No temp sysfs for reading raw temp Dec 08 10:07:10 NGC1976 thermald[24339]: Using generated /etc/thermald/thermal-conf.xml.auto Dec 08 10:07:10 NGC1976 thermald[24339]: Using config file /etc/thermald/thermal-conf.xml.auto Dec 08 10:07:10 NGC1976 thermald[24339]: Polling mode is enabled: 4



in my case i saved the binary file inside my ~/bin/  to be able to run the command again when necessary.

Actually im not very comfortable with this cause the problem is still there and now the thermald has a dependency of a project obsolete.
jprissi commented 8 months ago

After a system update, I was missing the /etc/thermald/thermal-conf.xml.auto configuration file. The thermald service would not start : Signal 6 ( SIGABRT ).

coredumpctl debug thermald would return the following stack trace :

#0  0x00007f145b0ac83c n/a (libc.so.6 + 0x8e83c)
#1  0x00007f145b05c668 raise (libc.so.6 + 0x3e668)
#2  0x00007f145b0444b8 abort (libc.so.6 + 0x264b8)
#3  0x00007f145b045390 n/a (libc.so.6 + 0x27390)
#4  0x00007f145b13cb4b __fortify_fail (libc.so.6 + 0x11eb4b)
#5  0x00007f145b13de56 __stack_chk_fail (libc.so.6 + 0x11fe56)
#6  0x000055dae44bee55 _ZN13cthd_acpi_rel9read_psvtEv (thermald + 0x48e55)
#7  0x0000000000010000 n/a (n/a + 0x0)
ELF object binary architecture: AMD x86-64

The above steps provided by @shellclear helped solve the issue. Unfortunately, this is a workaround and would probably require some attention.

spandruvada commented 6 months ago

I think this is caused by this issue: https://github.com/intel/thermal_daemon/commit/9ac497badd88d9a31b0dfde98d8a9054a4087008

I pushed this change to master branch. Please check.