vmatare / thinkfan

The minimalist fan control program
GNU General Public License v3.0
534 stars 61 forks source link

Segfault with a basic config #206

Closed bhundven closed 1 year ago

bhundven commented 1 year ago

Using latest master branch, commit id 8436612395df70623764eb19cef9008be6c2cb36

On a Thinkpad X1 Extreme (Gen 4i).

/etc/thinkfan.conf:

---
sensors:
  - tpacpi: /proc/acpi/ibm/thermal

fans:
  - tpacpi: /proc/acpi/ibm/fan

levels:
  - ["level 0", 0, 28]
  - ["level 3", 25, 48]
  - ["level 4", 40, 60]
  - ["level 5", 55, 100]
  - ["level 7", 95, 255]

segfault:

ERROR: Segmentation fault.
Backtrace:
thinkfan(+0xea3df) [0x56445968b3df]
thinkfan(thinkfan::Error::Error(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x61) [0x56445968b931]
thinkfan(thinkfan::Bug::Bug(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x23) [0x56445968ba93]
thinkfan(thinkfan::sig_handler(int)+0xea) [0x5644596512dd]
/usr/lib/libc.so.6(+0x38a00) [0x7fcbbca31a00]
thinkfan(thinkfan::TemperatureState::Ref::add_temp(int)+0x2f) [0x564459681ba3]
thinkfan(thinkfan::TpSensorDriver::read_temps_()+0x588) [0x56445966bb44]
thinkfan(thinkfan::Driver::robust_io<thinkfan::SensorDriver>(void (thinkfan::SensorDriver::*)())::{lambda()#1}::operator()() const+0x75) [0x56445966f3fb]
thinkfan(void std::__invoke_impl<void, thinkfan::Driver::robust_io<thinkfan::SensorDriver>(void (thinkfan::SensorDriver::*)())::{lambda()#1}&>(std::__invoke_other, thinkfan::Driver::robust_io<thinkfan::SensorDriver>(void (thinkfan::SensorDriver::*)())::{lambda()#1}&)+0x20) [0x5644596736be]
thinkfan(std::enable_if<is_invocable_r_v<void, thinkfan::Driver::robust_io<thinkfan::SensorDriver>(void (thinkfan::SensorDriver::*)())::{lambda()#1}&>, void>::type std::__invoke_r<void, thinkfan::Driver::robust_io<thinkfan::SensorDriver>(void (thinkfan::SensorDriver::*)())::{lambda()#1}&>(thinkfan::Driver::robust_io<thinkfan::SensorDriver>(void (thinkfan::SensorDriver::*)())::{lambda()#1}&)+0x20) [0x564459672d79]
thinkfan(std::_Function_handler<void (), thinkfan::Driver::robust_io<thinkfan::SensorDriver>(void (thinkfan::SensorDriver::*)())::{lambda()#1}>::_M_invoke(std::_Any_data const&)+0x20) [0x564459672178]
thinkfan(std::function<void ()>::operator()() const+0x32) [0x564459676262]
thinkfan(thinkfan::Driver::robust_op(std::function<void ()>, std::function<void (thinkfan::ExpectedError const&)>)+0x56) [0x564459675262]
thinkfan(void thinkfan::Driver::robust_io<thinkfan::SensorDriver>(void (thinkfan::SensorDriver::*)())+0x142) [0x56445966f540]
thinkfan(thinkfan::SensorDriver::read_temps()+0x48) [0x5644596695cc]
thinkfan(main+0x3c3) [0x564459652f04]
/usr/lib/libc.so.6(+0x23290) [0x7fcbbca1c290]
/usr/lib/libc.so.6(__libc_start_main+0x8a) [0x7fcbbca1c34a]
thinkfan(_start+0x25) [0x564459650cd5]

This is probably a bug. Please consider reporting this at https://github.com/vmatare/thinkfan/issues. Thanks.

Running archlinux. Linux mill 6.0.9-arch1-1 #1 SMP PREEMPT_DYNAMIC Wed, 16 Nov 2022 17:01:17 +0000 x86_64 GNU/Linux

packages:

[bhundven@mill build]$ pacman -Q 2>&1 | grep -E '(cmake|gcc|yaml|pkgconf|lm_senso|systemd)'
cmake 3.24.3-1
gcc 12.2.0-1
gcc-libs 12.2.0-1
gcc11 11.3.0-4
gcc11-libs 11.3.0-4
lib32-gcc-libs 12.2.0-1
lib32-lm_sensors 1:3.6.0.r41.g31d1f125-2
lib32-systemd 252.1-1
libyaml 0.2.5-1
lm_sensors 1:3.6.0.r41.g31d1f125-2
pkgconf 1.8.0-1
systemd 252.1-1
systemd-libs 252.1-1
systemd-sysvcompat 252.1-1
yaml-cpp 0.7.0-2

Build setup:

mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE:STRING=Debug -DUSE_ATASMART:BOOL=ON -DCMAKE_INSTALL_PREFIX=/usr ..
make
sudo make install
bhundven commented 1 year ago
[bhundven@mill thinkfan]$ sudo gdb -q --args thinkfan -b0
Reading symbols from thinkfan...
(gdb) run
Starting program: /usr/bin/thinkfan -b0
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
thinkfan::TemperatureState::Ref::add_temp (this=0x5555556b3ae8, t=58) at /home/bhundven/Projects/github.com/vmatare/thinkfan/src/temperature_state.cpp:61
61      int diff = *temp_ > 0 ?
(gdb) bt
#0  thinkfan::TemperatureState::Ref::add_temp (this=0x5555556b3ae8, t=58) at /home/bhundven/Projects/github.com/vmatare/thinkfan/src/temperature_state.cpp:61
#1  0x000055555561eb44 in thinkfan::TpSensorDriver::read_temps_ (this=0x5555556b3a90) at /home/bhundven/Projects/github.com/vmatare/thinkfan/src/sensors.cpp:264
#2  0x00005555556223fb in thinkfan::Driver::robust_io<thinkfan::SensorDriver>(void (thinkfan::SensorDriver::*)())::{lambda()#1}::operator()() const (__closure=0x7fffffffe740)
    at /home/bhundven/Projects/github.com/vmatare/thinkfan/src/driver.h:95
#3  0x00005555556266be in std::__invoke_impl<void, thinkfan::Driver::robust_io<thinkfan::SensorDriver>(void (thinkfan::SensorDriver::*)())::{lambda()#1}&>(std::__invoke_other, thinkfan::Driver::robust_io<thinkfan::SensorDriver>(void (thinkfan::SensorDriver::*)())::{lambda()#1}&) (__f=...) at /usr/include/c++/12.2.0/bits/invoke.h:61
#4  0x0000555555625d79 in std::__invoke_r<void, thinkfan::Driver::robust_io<thinkfan::SensorDriver>(void (thinkfan::SensorDriver::*)())::{lambda()#1}&>(thinkfan::Driver::robust_io<thinkfan::SensorDriver>(void (thinkfan::SensorDriver::*)())::{lambda()#1}&) (__fn=...) at /usr/include/c++/12.2.0/bits/invoke.h:111
#5  0x0000555555625178 in std::_Function_handler<void (), thinkfan::Driver::robust_io<thinkfan::SensorDriver>(void (thinkfan::SensorDriver::*)())::{lambda()#1}>::_M_invoke(std::_Any_data const&) (__functor=...) at /usr/include/c++/12.2.0/bits/std_function.h:290
#6  0x0000555555629262 in std::function<void ()>::operator()() const (this=0x7fffffffe740) at /usr/include/c++/12.2.0/bits/std_function.h:591
#7  0x0000555555628262 in thinkfan::Driver::robust_op(std::function<void ()>, std::function<void (thinkfan::ExpectedError const&)>) (this=0x5555556b3a90, op_fn=..., skip_fn=...)
    at /home/bhundven/Projects/github.com/vmatare/thinkfan/src/driver.cpp:58
#8  0x0000555555622540 in thinkfan::Driver::robust_io<thinkfan::SensorDriver> (this=0x5555556b3a90, io_func=&virtual thinkfan::SensorDriver::read_temps_())
    at /home/bhundven/Projects/github.com/vmatare/thinkfan/src/driver.h:93
#9  0x000055555561c5cc in thinkfan::SensorDriver::read_temps (this=0x5555556b3a90) at /home/bhundven/Projects/github.com/vmatare/thinkfan/src/sensors.cpp:91
#10 0x0000555555605f04 in main (argc=2, argv=0x7fffffffeae8) at /home/bhundven/Projects/github.com/vmatare/thinkfan/src/thinkfan.cpp:374
(gdb) p diff
$1 = 21845
bhundven commented 1 year ago

Also, I was trying to bring in more config options from my older 1.3.1 config, and added hwmon for coretemp:

---
sensors:
  - tpacpi: /proc/acpi/ibm/thermal
  - hwmon: /sys/class/hwmon
    name: coretemp
    indices: [1,2,3,4,5,6,7,8,9]

fans:
  - tpacpi: /proc/acpi/ibm/fan

levels:
  - ["level 0", 0, 28]
  - ["level 3", 25, 48]
  - ["level 4", 40, 60]
  - ["level 5", 55, 100]
  - ["level 7", 95, 255]

and when I run thinkfan:

[bhundven@mill thinkfan]$ sudo thinkfan -b0
free(): invalid pointer
Aborted
[root@mill build]# cat /sys/class/hwmon/hwmon8/name 
coretemp
[root@mill build]# ls /sys/class/hwmon/hwmon8/
device      temp1_crit_alarm  temp2_crit_alarm  temp3_crit_alarm  temp4_crit_alarm  temp5_crit_alarm  temp6_crit_alarm  temp7_crit_alarm  temp8_crit_alarm  temp9_crit_alarm
name        temp1_input       temp2_input   temp3_input   temp4_input       temp5_input       temp6_input   temp7_input   temp8_input       temp9_input
power       temp1_label       temp2_label   temp3_label   temp4_label       temp5_label       temp6_label   temp7_label   temp8_label       temp9_label
subsystem   temp1_max         temp2_max     temp3_max     temp4_max     temp5_max         temp6_max     temp7_max     temp8_max     temp9_max
temp1_crit  temp2_crit        temp3_crit    temp4_crit    temp5_crit        temp6_crit        temp7_crit    temp8_crit    temp9_crit        uevent
bhundven commented 1 year ago

For the invalid pointer:

[bhundven@mill thinkfan]$ sudo gdb -q --args thinkfan -b0
Reading symbols from thinkfan...
(gdb) run
Starting program: /usr/bin/thinkfan -b0
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
free(): invalid pointer

Program received signal SIGABRT, Aborted.
0x00007ffff79b964c in ?? () from /usr/lib/libc.so.6
(gdb) bt
#0  0x00007ffff79b964c in ?? () from /usr/lib/libc.so.6
#1  0x00007ffff7969958 in raise () from /usr/lib/libc.so.6
#2  0x00007ffff795353d in abort () from /usr/lib/libc.so.6
#3  0x00007ffff79ad7ee in ?? () from /usr/lib/libc.so.6
#4  0x00007ffff79c33dc in ?? () from /usr/lib/libc.so.6
#5  0x00007ffff79c522c in ?? () from /usr/lib/libc.so.6
#6  0x00007ffff79c7ba3 in free () from /usr/lib/libc.so.6
#7  0x000055555561beeb in std::__new_allocator<unsigned int>::deallocate (this=0x5555556b6fb0, __p=0x5555556c1350, __n=9) at /usr/include/c++/12.2.0/bits/new_allocator.h:158
#8  0x000055555561beaa in std::allocator_traits<std::allocator<unsigned int> >::deallocate (__a=..., __p=0x5555556c1350, __n=9) at /usr/include/c++/12.2.0/bits/alloc_traits.h:496
#9  0x000055555561be4a in std::_Vector_base<unsigned int, std::allocator<unsigned int> >::_M_deallocate (this=0x5555556b6fb0, __p=0x5555556c1350, __n=9)
    at /usr/include/c++/12.2.0/bits/stl_vector.h:387
#10 0x000055555561bdb4 in std::_Vector_base<unsigned int, std::allocator<unsigned int> >::~_Vector_base (this=0x5555556b6fb0, __in_chrg=<optimized out>)
    at /usr/include/c++/12.2.0/bits/stl_vector.h:366
#11 0x000055555561bd45 in std::vector<unsigned int, std::allocator<unsigned int> >::~vector (this=0x5555556b6fb0, __in_chrg=<optimized out>) at /usr/include/c++/12.2.0/bits/stl_vector.h:733
#12 0x000055555561bcca in std::_Optional_payload_base<std::vector<unsigned int, std::allocator<unsigned int> > >::_M_destroy (this=0x5555556b6fb0) at /usr/include/c++/12.2.0/optional:287
#13 0x000055555561bc3e in std::_Optional_payload_base<std::vector<unsigned int, std::allocator<unsigned int> > >::_M_reset (this=0x5555556b6fb0) at /usr/include/c++/12.2.0/optional:318
#14 0x000055555561bab8 in std::_Optional_payload<std::vector<unsigned int, std::allocator<unsigned int> >, false, false, false>::~_Optional_payload (this=0x5555556b6fb0, 
    __in_chrg=<optimized out>) at /usr/include/c++/12.2.0/optional:439
#15 0x000055555561b6e4 in std::_Optional_base<std::vector<unsigned int, std::allocator<unsigned int> >, false, false>::~_Optional_base (this=0x5555556b6fb0, __in_chrg=<optimized out>)
    at /usr/include/c++/12.2.0/optional:510
#16 0x000055555561b700 in std::optional<std::vector<unsigned int, std::allocator<unsigned int> > >::~optional (this=0x5555556b6fb0, __in_chrg=<optimized out>)
    at /usr/include/c++/12.2.0/optional:705
#17 0x0000555555627e02 in thinkfan::HwmonInterface<thinkfan::SensorDriver>::~HwmonInterface (this=0x5555556b6f60, __in_chrg=<optimized out>)
    at /home/bhundven/Projects/github.com/vmatare/thinkfan/src/hwmon.h:36
#18 0x0000555555627e39 in std::_Destroy<thinkfan::HwmonInterface<thinkfan::SensorDriver> > (__pointer=0x5555556b6f60) at /usr/include/c++/12.2.0/bits/stl_construct.h:151
#19 0x0000555555627dd2 in std::allocator_traits<std::allocator<void> >::destroy<thinkfan::HwmonInterface<thinkfan::SensorDriver> > (__p=0x5555556b6f60)
    at /usr/include/c++/12.2.0/bits/alloc_traits.h:648
#20 0x0000555555627ca5 in std::_Sp_counted_ptr_inplace<thinkfan::HwmonInterface<thinkfan::SensorDriver>, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose (this=0x5555556b6f50)
    at /usr/include/c++/12.2.0/bits/shared_ptr_base.h:613
#21 0x000055555560f84f in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x5555556b6f50) at /usr/include/c++/12.2.0/bits/shared_ptr_base.h:346
#22 0x0000555555610749 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x5555556c24f0, __in_chrg=<optimized out>)
    at /usr/include/c++/12.2.0/bits/shared_ptr_base.h:1071
#23 0x000055555562187a in std::__shared_ptr<thinkfan::HwmonInterface<thinkfan::SensorDriver>, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x5555556c24e8, __in_chrg=<optimized out>)
    at /usr/include/c++/12.2.0/bits/shared_ptr_base.h:1524
#24 0x0000555555621896 in std::shared_ptr<thinkfan::HwmonInterface<thinkfan::SensorDriver> >::~shared_ptr (this=0x5555556c24e8, __in_chrg=<optimized out>)
    at /usr/include/c++/12.2.0/bits/shared_ptr.h:175
#25 0x0000555555627b86 in thinkfan::HwmonSensorDriver::~HwmonSensorDriver (this=0x5555556c2450, __in_chrg=<optimized out>)
    at /home/bhundven/Projects/github.com/vmatare/thinkfan/src/sensors.h:82
#26 0x0000555555627bae in thinkfan::HwmonSensorDriver::~HwmonSensorDriver (this=0x5555556c2450, __in_chrg=<optimized out>)
    at /home/bhundven/Projects/github.com/vmatare/thinkfan/src/sensors.h:82
#27 0x000055555560b3be in std::default_delete<thinkfan::SensorDriver>::operator() (this=0x5555556ae6b8, __ptr=0x5555556c2450) at /usr/include/c++/12.2.0/bits/unique_ptr.h:95
#28 0x000055555560b2de in std::unique_ptr<thinkfan::SensorDriver, std::default_delete<thinkfan::SensorDriver> >::~unique_ptr (this=0x5555556ae6b8, __in_chrg=<optimized out>)
    at /usr/include/c++/12.2.0/bits/unique_ptr.h:396
#29 0x000055555560b213 in std::_Destroy<std::unique_ptr<thinkfan::SensorDriver, std::default_delete<thinkfan::SensorDriver> > > (__pointer=0x5555556ae6b8)
    at /usr/include/c++/12.2.0/bits/stl_construct.h:151
#30 0x000055555560b06f in std::_Destroy_aux<false>::__destroy<std::unique_ptr<thinkfan::SensorDriver, std::default_delete<thinkfan::SensorDriver> >*> (__first=0x5555556ae6b8, 
    __last=0x5555556ae6c0) at /usr/include/c++/12.2.0/bits/stl_construct.h:163
#31 0x000055555560ad44 in std::_Destroy<std::unique_ptr<thinkfan::SensorDriver, std::default_delete<thinkfan::SensorDriver> >*> (__first=0x5555556ae670, __last=0x5555556ae6c0)
    at /usr/include/c++/12.2.0/bits/stl_construct.h:196
#32 0x000055555560a715 in std::_Destroy<std::unique_ptr<thinkfan::SensorDriver, std::default_delete<thinkfan::SensorDriver> >*, std::unique_ptr<thinkfan::SensorDriver, std::default_delete<thinkfan::SensorDriver> > > (__first=0x5555556ae670, __last=0x5555556ae6c0) at /usr/include/c++/12.2.0/bits/alloc_traits.h:850
#33 0x0000555555609dc9 in std::vector<std::unique_ptr<thinkfan::SensorDriver, std::default_delete<thinkfan::SensorDriver> >, std::allocator<std::unique_ptr<thinkfan::SensorDriver, std::default_delete<thinkfan::SensorDriver> > > >::~vector (this=0x5555556aff10, __in_chrg=<optimized out>) at /usr/include/c++/12.2.0/bits/stl_vector.h:730
#34 0x0000555555609372 in thinkfan::Config::~Config (this=0x5555556afef0, __in_chrg=<optimized out>) at /home/bhundven/Projects/github.com/vmatare/thinkfan/src/config.h:125
#35 0x00005555556093a4 in std::default_delete<thinkfan::Config const>::operator() (this=0x7fffffffe828, __ptr=0x5555556afef0) at /usr/include/c++/12.2.0/bits/unique_ptr.h:95
--Type <RET> for more, q to quit, c to continue without paging--c
#36 0x0000555555608706 in std::unique_ptr<thinkfan::Config const, std::default_delete<thinkfan::Config const> >::~unique_ptr (this=0x7fffffffe828, __in_chrg=<optimized out>) at /usr/include/c++/12.2.0/bits/unique_ptr.h:396
#37 0x0000555555605f3f in main (argc=2, argv=0x7fffffffeae8) at /home/bhundven/Projects/github.com/vmatare/thinkfan/src/thinkfan.cpp:377
vmatare commented 1 year ago

Thanks a lot for getting me proper backtraces :+1:, will investigate...

vmatare commented 1 year ago

OK, this is a bad interaction between the new deferred (retryable) sensor initialization logic and the tpacpi sensor type in the case where no indices are specified. As workaround, you can add an entry like indices: [ 1, 2, 3, 4, 5, 6, 7, 8 ] to your tpacpi sensor. If your model has 16 values and you don't want to pick you'll have to list all 16.

bhundven commented 1 year ago

If I don't add the indices I still get the invalid pointer. If I add them like you described, then I only get the first backtrace from tpacpi.

bhundven commented 1 year ago

This config works:

---
sensors:
  # - tpacpi: /proc/acpi/ibm/thermal
  - nvml: 01:00.0
  - hwmon: /sys/class/hwmon
    name: coretemp
    indices: [ 1, 2, 3, 4, 5, 6, 7, 8 ]
  - hwmon: /sys/class/hwmon
    name: nvme
    indices: [1]

fans:
  - tpacpi: /proc/acpi/ibm/fan

levels:
  - ["level 0", 0, 28]
  - ["level 3", 25, 48]
  - ["level 4", 40, 60]
  - ["level 5", 55, 100]
  - ["level 7", 95, 255]

If I remove the indices from the hwmon nvme line, I get:

Nov 20 10:35:42 mill thinkfan[5757]: ERROR: read_temps_: Failed to read temperature(s) from /sys/class/hwmon/hwmon3: Is a directory
bhundven commented 1 year ago

If I add indices to the tpacpi sensor, it works:

---
sensors:
  - tpacpi: /proc/acpi/ibm/thermal
    indices: [ 1, 2, 3, 5, 6, 7 ]
  - nvml: 01:00.0
  - hwmon: /sys/class/hwmon
    name: coretemp
    indices: [ 1, 2, 3, 4, 5, 6, 7, 8 ]
  - hwmon: /sys/class/hwmon
    name: nvme
    indices: [1]

fans:
  - tpacpi: /proc/acpi/ibm/fan

levels:
  - ["level 0", 0, 28]
  - ["level 3", 25, 48]
  - ["level 4", 40, 60]
  - ["level 5", 55, 100]
  - ["level 7", 95, 255]

I think it's because of two of the values in /proc/acpi/ibm/thermal:

[bhundven@mill thinkfan]$ cat /proc/acpi/ibm/thermal 
temperatures:   58 55 56 0 59 53 43 -128

[4] is 0, and [8] is -128.

vmatare commented 1 year ago

The problem is that the memory range for the temperature values is being initialized before the sensor has been initialized, because for the upcoming 2.0.0 release the whole initialization logic has undergone a redesign to allow for deferred initialization and dynamic error recovery.

The oversight on my part was that in the case of a tpacpi sensor, if you don't specify indices, the sensor reports 0 as the number of available temperatures before initialization. Then we end up writing temperatures into uninitialized memory.

The whole process is slightly complex because now with the optional and max_errors keywords, you have many different scenarios when sensor (re-)initialization needs to (or doesn't need to) happen.

vmatare commented 1 year ago

This should be fixed in the latest master now. Could you please update and see if it works for you now?

bhundven commented 1 year ago

That fixed tpacpi. I still have to provide indices for hwmon coretemp, which is probably a different issue.

vmatare commented 1 year ago

Huh... haven't looked too closely at that yet, was hoping it might be the same issue ;-)

Guess I'll have to take another look.

bhundven commented 1 year ago

Actually, I'm wrong. After the tpacpi fix, I get this:

[bhundven@mill ~]$ sudo thinkfan -b0

ERROR: read_temps_: Failed to read temperature(s) from /sys/class/hwmon/hwmon7: Is a directory
---
sensors:
  - tpacpi: /proc/acpi/ibm/thermal
  - nvml: 01:00.0
  - hwmon: /sys/class/hwmon
    name: coretemp
  - hwmon: /sys/class/hwmon
    name: nvme
    indices: [1]

fans:
  - tpacpi: /proc/acpi/ibm/fan

levels:
  - ["level 0", 0, 28]
  - ["level 3", 25, 48]
  - ["level 4", 40, 60]
  - ["level 5", 55, 100]
  - ["level 7", 95, 255]
vmatare commented 1 year ago

Ah right, now that looks a lot like #203. Might help me reproduce that one :-D

So in order to keep them cleanly separated, I'll close this one as finished and continue with #203.

vmatare commented 1 year ago

Thanks for the great help with debugging btw, every extra bit of information helps :+1: