intel / thermal_daemon

Thermal daemon for IA
GNU General Public License v2.0
539 stars 117 forks source link

2.5.5 crashes on startup #428

Closed c4rlo closed 6 months ago

c4rlo commented 7 months ago

When upgrading to thermald 2.5.5, it consistently crashes on startup. This did not happen with 2.5.4. This is on Arch Linux.

Mon 2024-01-22 14:22:53 GMT systemd[1]: Starting Thermal Daemon Service...
Mon 2024-01-22 14:22:53 GMT thermald[1050]: 22 CPUID levels; family:model:stepping 0x6:9e:a (6:158:10)
Mon 2024-01-22 14:22:54 GMT systemd[1]: Starting Daemon for power management...
Mon 2024-01-22 14:22:54 GMT systemd[1]: Started Daemon for power management.
Mon 2024-01-22 14:22:54 GMT thermald[1050]: Unable to find a zone for SEN2
Mon 2024-01-22 14:22:54 GMT thermald[1050]: Unable to find a zone for TSKN
Mon 2024-01-22 14:22:54 GMT thermald[1050]: Unable to find a zone for SEN3
Mon 2024-01-22 14:22:54 GMT thermald[1050]: Unable to find a zone for SEN3
Mon 2024-01-22 14:22:54 GMT thermald[1050]: Unable to find a zone for SEN1
Mon 2024-01-22 14:22:54 GMT thermald[1050]: Unable to find a zone for NGFF
Mon 2024-01-22 14:22:54 GMT thermald[1050]: Unable to find a zone for NGFF
Mon 2024-01-22 14:22:54 GMT thermald[1050]: Unable to find a zone for TMEM
Mon 2024-01-22 14:22:54 GMT thermald[1050]: Unable to find a zone for TMEM
Mon 2024-01-22 14:22:54 GMT thermald[1050]: Polling mode is enabled: 4
Mon 2024-01-22 14:22:54 GMT kernel: traps: thermald[1050] general protection fault ip:75d505578edd sp:7ffe4e723798 error:0 in libc.so.6[75d505444000+15a000]
Mon 2024-01-22 14:22:54 GMT systemd[1]: Created slice Slice /system/systemd-coredump.
Mon 2024-01-22 14:22:54 GMT systemd[1]: Started Process Core Dump (PID 1062/UID 0).
Mon 2024-01-22 14:22:54 GMT systemd-coredump[1063]: [🡕] Process 1050 (thermald) of user 0 dumped core.

                                                    Stack trace of thread 1050:
                                                    #0  0x000075d505578edd n/a (libc.so.6 + 0x15aedd)
                                                    #1  0x000075d50547e6e9 n/a (libc.so.6 + 0x606e9)
                                                    #2  0x000075d50549f125 n/a (libc.so.6 + 0x81125)
                                                    #3  0x000075d505e602c3 g_vasprintf (libglib-2.0.so.0 + 0xac2c3)
                                                    #4  0x000075d505e31aa2 g_strdup_vprintf (libglib-2.0.so.0 + 0x7daa2)
                                                    #5  0x000075d505e1628b g_logv (libglib-2.0.so.0 + 0x6228b)
                                                    #6  0x000075d505e16724 g_log (libglib-2.0.so.0 + 0x62724)
                                                    #7  0x000062eae0484e70 n/a (thermald + 0x1fe70)
                                                    #8  0x000075d505ceb8d8 n/a (libgio-2.0.so.0 + 0x1108d8)
                                                    #9  0x000075d505c84d14 n/a (libgio-2.0.so.0 + 0xa9d14)
                                                    #10 0x000075d505c88c2d n/a (libgio-2.0.so.0 + 0xadc2d)
                                                    #11 0x000075d505ce6f43 n/a (libgio-2.0.so.0 + 0x10bf43)
                                                    #12 0x000075d505c84d14 n/a (libgio-2.0.so.0 + 0xa9d14)
                                                    #13 0x000075d505c84d4d n/a (libgio-2.0.so.0 + 0xa9d4d)
                                                    #14 0x000075d505e0df69 n/a (libglib-2.0.so.0 + 0x59f69)
                                                    #15 0x000075d505e6c367 n/a (libglib-2.0.so.0 + 0xb8367)
                                                    #16 0x000075d505e0eb97 g_main_loop_run (libglib-2.0.so.0 + 0x5ab97)
                                                    #17 0x000062eae0482fae main (thermald + 0x1dfae)
                                                    #18 0x000075d505445cd0 n/a (libc.so.6 + 0x27cd0)
                                                    #19 0x000075d505445d8a __libc_start_main (libc.so.6 + 0x27d8a)
                                                    #20 0x000062eae0483645 _start (thermald + 0x1e645)

                                                    Stack trace of thread 1054:
                                                    #0  0x000075d505520f6f __poll (libc.so.6 + 0x102f6f)
                                                    #1  0x000075d505e6c2b6 n/a (libglib-2.0.so.0 + 0xb82b6)
                                                    #2  0x000075d505e0eb97 g_main_loop_run (libglib-2.0.so.0 + 0x5ab97)
                                                    #3  0x000075d505ced19c n/a (libgio-2.0.so.0 + 0x11219c)
                                                    #4  0x000075d505e3fa05 n/a (libglib-2.0.so.0 + 0x8ba05)
                                                    #5  0x000075d5054aa9eb n/a (libc.so.6 + 0x8c9eb)
                                                    #6  0x000075d50552e7cc n/a (libc.so.6 + 0x1107cc)

                                                    Stack trace of thread 1052:
                                                    #0  0x000075d50552c73d syscall (libc.so.6 + 0x10e73d)
                                                    #1  0x000075d505e672f7 g_cond_wait (libglib-2.0.so.0 + 0xb32f7)
                                                    #2  0x000075d505dd91b4 n/a (libglib-2.0.so.0 + 0x251b4)
                                                    #3  0x000075d505e41a8e n/a (libglib-2.0.so.0 + 0x8da8e)
                                                    #4  0x000075d505e3fa05 n/a (libglib-2.0.so.0 + 0x8ba05)
                                                    #5  0x000075d5054aa9eb n/a (libc.so.6 + 0x8c9eb)
                                                    #6  0x000075d50552e7cc n/a (libc.so.6 + 0x1107cc)

                                                    Stack trace of thread 1051:
                                                    #0  0x000075d505520f6f __poll (libc.so.6 + 0x102f6f)
                                                    #1  0x000075d505e6c2b6 n/a (libglib-2.0.so.0 + 0xb82b6)
                                                    #2  0x000075d505e0c162 g_main_context_iteration (libglib-2.0.so.0 + 0x58162)
                                                    #3  0x000075d505e0c1b2 n/a (libglib-2.0.so.0 + 0x581b2)
                                                    #4  0x000075d505e3fa05 n/a (libglib-2.0.so.0 + 0x8ba05)
                                                    #5  0x000075d5054aa9eb n/a (libc.so.6 + 0x8c9eb)
                                                    #6  0x000075d50552e7cc n/a (libc.so.6 + 0x1107cc)

                                                    Stack trace of thread 1053:
                                                    #0  0x000075d50552c73d syscall (libc.so.6 + 0x10e73d)
                                                    #1  0x000075d505e67cd3 g_cond_wait_until (libglib-2.0.so.0 + 0xb3cd3)
                                                    #2  0x000075d505dd9185 n/a (libglib-2.0.so.0 + 0x25185)
                                                    #3  0x000075d505dd92e7 g_async_queue_timeout_pop (libglib-2.0.so.0 + 0x252e7)
                                                    #4  0x000075d505e4237e n/a (libglib-2.0.so.0 + 0x8e37e)
                                                    #5  0x000075d505e3fa05 n/a (libglib-2.0.so.0 + 0x8ba05)
                                                    #6  0x000075d5054aa9eb n/a (libc.so.6 + 0x8c9eb)
                                                    #7  0x000075d50552e7cc n/a (libc.so.6 + 0x1107cc)

                                                    Stack trace of thread 1061:
                                                    #0  0x000075d50552e7ca n/a (libc.so.6 + 0x1107ca)
                                                    ELF object binary architecture: AMD x86-64
Mon 2024-01-22 14:22:54 GMT systemd[1]: systemd-coredump@0-1062-0.service: Deactivated successfully.
Mon 2024-01-22 14:22:54 GMT systemd[1]: thermald.service: Main process exited, code=dumped, status=11/SEGV
Mon 2024-01-22 14:22:54 GMT systemd[1]: thermald.service: Failed with result 'core-dump'.
Mon 2024-01-22 14:22:54 GMT systemd[1]: Failed to start Thermal Daemon Service.
c4rlo commented 7 months ago

After building with debug info, we can see better what happened:

#0  0x000072ed66b78edd in  () at /usr/lib/libc.so.6
#1  0x000072ed66a7e6e9 in  () at /usr/lib/libc.so.6
#2  0x000072ed66a9f125 in  () at /usr/lib/libc.so.6
#3  0x000072ed673652c3 in g_vasprintf () at /usr/lib/libglib-2.0.so.0
#4  0x000072ed67336aa2 in g_strdup_vprintf () at /usr/lib/libglib-2.0.so.0
#5  0x000072ed6731b28b in g_logv () at /usr/lib/libglib-2.0.so.0
#6  0x000072ed6731b724 in g_log () at /usr/lib/libglib-2.0.so.0
#7  0x0000598d05c32130 in thd_dbus_on_bus_acquired(GDBusConnection*, gchar const*, gpointer) (connection=0x598d069a4300, name=<optimized out>, user_data=0x598d06998810)
    at src/thd_dbus_interface.cpp:1172
#8  0x000072ed671f08d8 in  () at /usr/lib/libgio-2.0.so.0
#9  0x000072ed67189d14 in  () at /usr/lib/libgio-2.0.so.0
#10 0x000072ed6718dc2d in  () at /usr/lib/libgio-2.0.so.0
#11 0x000072ed671ebf43 in  () at /usr/lib/libgio-2.0.so.0
#12 0x000072ed67189d14 in  () at /usr/lib/libgio-2.0.so.0
#13 0x000072ed67189d4d in  () at /usr/lib/libgio-2.0.so.0
#14 0x000072ed67312f69 in  () at /usr/lib/libglib-2.0.so.0
#15 0x000072ed67371367 in  () at /usr/lib/libglib-2.0.so.0
#16 0x000072ed67313b97 in g_main_loop_run () at /usr/lib/libglib-2.0.so.0
#17 0x0000598d05c30642 in main(int, char**) (argc=<optimized out>, argv=<optimized out>) at src/main.cpp:376

That finding lead me to propose #429 as the fix for this.

mestinso commented 7 months ago

I'm just providing another data point here: I'm also getting a startup crash on Arch Linux since this update hit. Let me know if sharing more information/debug logs is helpful.

spandruvada commented 7 months ago

https://github.com/intel/thermal_daemon/commit/e49e4baf6ca12c647e8a4bc4e50743bc475d316a The above commit should fix. I applied that commit. Let me know if this is fixed. I have to update rev.

edwloef commented 7 months ago

Building from latest git has resolved the issue for me.

freswa commented 7 months ago

@spandruvada Can confirm this fixes the issue for Arch. Can we get a 2.5.6 release please?

untainsYD commented 7 months ago

I do support this issue, have the same.

> systemctl status thermald.service
× thermald.service - Thermal Daemon Service
     Loaded: loaded (/usr/lib/systemd/system/thermald.service; enabled; preset: disabled)
     Active: failed (Result: core-dump) since Tue 2024-01-23 16:06:29 EET; 13min ago
    Process: 11592 ExecStart=/usr/bin/thermald --systemd --dbus-enable --adaptive (code=dumped, signal=SEGV)
   Main PID: 11592 (code=dumped, signal=SEGV)
        CPU: 50ms

Jan 23 16:06:28 untainsYD-workstation systemd[1]: thermald.service: Failed with result 'core-dump'.
Jan 23 16:06:28 untainsYD-workstation systemd[1]: Failed to start Thermal Daemon Service.
Jan 23 16:06:29 untainsYD-workstation systemd[1]: thermald.service: Scheduled restart job, restart counter is at 5.
Jan 23 16:06:29 untainsYD-workstation systemd[1]: thermald.service: Start request repeated too quickly.
Jan 23 16:06:29 untainsYD-workstation systemd[1]: thermald.service: Failed with result 'core-dump'.
Jan 23 16:06:29 untainsYD-workstation systemd[1]: Failed to start Thermal Daemon Service.
spandruvada commented 7 months ago

@spandruvada Can confirm this fixes the issue for Arch. Can we get a 2.5.6 release please?

Released.