vmatare / thinkfan

The minimalist fan control program
GNU General Public License v3.0
541 stars 62 forks source link

thinkfan doesn't come back up on wake #146

Closed dkowis closed 2 years ago

dkowis commented 3 years ago

I'll end up with a very, very hot laptop :(

❯ systemctl status thinkfan-wakeup
● thinkfan-wakeup.service - Reload thinkfan after waking up from suspend
     Loaded: loaded (/lib/systemd/system/thinkfan-wakeup.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Mon 2021-08-02 06:12:32 CDT; 10h ago
   Main PID: 1915983 (code=exited, status=1/FAILURE)

Aug 02 06:12:32 PopPad systemd[1]: Starting Reload thinkfan after waking up from suspend...
Aug 02 06:12:32 PopPad systemd[1]: thinkfan-wakeup.service: Main process exited, code=exited, status=1/FAILURE
Aug 02 06:12:32 PopPad systemd[1]: thinkfan-wakeup.service: Failed with result 'exit-code'.
Aug 02 06:12:32 PopPad systemd[1]: Failed to start Reload thinkfan after waking up from suspend.

from syslog:

Jul 30 12:50:32 poppad systemd[1]: Starting Reload thinkfan after waking up from suspend...
Jul 30 12:50:32 poppad suspend: nvidia-resume.service
Jul 30 12:50:32 poppad logger[1710858]: <13>Jul 30 12:50:32 suspend: nvidia-resume.service
Jul 30 12:50:32 poppad systemd[1]: Stopped target Suspend.
Jul 30 12:50:32 poppad systemd[1]: nvidia-resume.service: Succeeded.
Jul 30 12:50:32 poppad systemd[1]: Finished NVIDIA system resume actions.
Jul 30 12:50:32 poppad systemd[1]: thinkfan-wakeup.service: Succeeded.
Jul 30 12:50:32 poppad systemd[1]: Finished Reload thinkfan after waking up from suspend.
Jul 30 12:50:32 poppad /usr/libexec/gdm-x-session[43410]: (II) systemd-logind: got resume for 13:64
Jul 30 12:50:32 poppad thinkfan: ERROR: Lost sensor read_temps: Failed to read temperature(s) from /sys/class/hwmon/hwmon4/temp1_input: No such device or address
Jul 30 12:50:32 poppad systemd[1]: thinkfan.service: Main process exited, code=exited, status=1/FAILURE
Jul 30 12:50:32 poppad systemd[1]: thinkfan.service: Failed with result 'exit-code'.
Jul 30 12:50:32 poppad /usr/libexec/gdm-x-session[43410]: (II) systemd-logind: got resume for 13:66
... some time passes...

Jul 30 19:49:02 poppad systemd[1]: thinkfan.service: Scheduled restart job, restart counter is at 12.
Jul 30 19:49:02 poppad systemd[1]: Stopped Thinkfan, the minimalist fan control program.
Jul 30 19:49:02 poppad systemd[1]: Starting Thinkfan, the minimalist fan control program...
Jul 30 19:49:03 poppad thinkfan: ERROR: read_temps: Failed to read temperature(s) from /sys/class/hwmon/hwmon4/temp1_input: No such device or address
Jul 30 19:49:03 poppad systemd[1]: thinkfan.service: Control process exited, code=exited, status=1/FAILURE
Jul 30 19:49:03 poppad systemd[1]: thinkfan.service: Failed with result 'exit-code'.
Jul 30 19:49:03 poppad systemd[1]: Failed to start Thinkfan, the minimalist fan control program.
Jul 30 19:49:04 poppad wpa_supplicant[1261]: wlp0s20f3: CTRL-EVENT-REGDOM-CHANGE init=DRIVER type=COUNTRY alpha2=US
Jul 30 19:49:04 poppad systemd[1]: thinkfan.service: Scheduled restart job, restart counter is at 13.
Jul 30 19:49:04 poppad systemd[1]: Stopped Thinkfan, the minimalist fan control program.
Jul 30 19:49:04 poppad systemd[1]: Starting Thinkfan, the minimalist fan control program...
Jul 30 19:49:04 poppad systemd[1]: Started Thinkfan, the minimalist fan control program.
Jul 30 19:49:04 poppad thinkfan: Temperatures(bias): 44(0), 40(0) -> level 1

Somehow it gets lost, and I'm not sure why, I have to force start it to get it to come back to life.

I have also added this, to try to get around it, but it hasn't worked.

### Editing /etc/systemd/system/thinkfan.service.d/override.conf
### Anything between here and the comment below will become the new contents of the file

[Service]
Restart=always
RestartSec=3s

### Lines below this comment will be discarded

### /lib/systemd/system/thinkfan.service
# [Unit]
# Description=Thinkfan, the minimalist fan control program
# After=sysinit.target
#
# [Service]
# Type=forking
# EnvironmentFile=-/etc/default/thinkfan
# ExecStart=/usr/sbin/thinkfan $THINKFAN_ARGS $DAEMON_ARGS
# PIDFile=/run/thinkfan.pid
# ExecReload=/bin/kill -HUP $MAINPID
#
# [Install]
# WantedBy=multi-user.target
# Also=thinkfan-sleep.service
# Also=thinkfan-wakeup.service
vmatare commented 3 years ago

Well, your problem is this:

thinkfan: ERROR: read_temps: Failed to read temperature(s) from /sys/class/hwmon/hwmon4/temp1_input: No such device or address

There's a sensor in your config that isn't always there. What is it? Switched off WiFi by any chance?

vmatare commented 3 years ago

I'll end up with a very, very hot laptop :(

Shouldn't happen if thinkfan fails with an error message. If thinkfan completely freezes up you might have a problem, but that doesn't seem to be the case here.

dkowis commented 3 years ago

Oh I should've posted my config, sorry about that.

sensors:
  - hwmon: /sys/class/hwmon
    name: thinkpad
    indices: [1]
  - nvml: 01:00.0

fans:
  - tpacpi: /proc/acpi/ibm/fan

levels:
   - [0, 0, 40]
   - [1, 37, 53]
   - [2, 50, 58]
   - [3, 56, 68]
   - [4, 65, 75]
   - [5, 73, 78]
   - ["level full-speed", 77, 32767]

It's definitely a hot laptop, and the messages from syslog are all I've got there.

I suppose it's possible that the nvidia temperature thing isn't working? That's all I can think of there.

Wifi is currently switched off, and it's behaving properly, and monitoring the right temperature.

hwmon directory that is the thinkfan:

/sys/class/hwmon🔒 on 🅰 (us-west-2)
❯ ls
 hwmon0   hwmon1   hwmon2   hwmon3   hwmon4   hwmon5   hwmon6   hwmon7   hwmon8   hwmon9

/sys/class/hwmon🔒 on 🅰 (us-west-2)
❯ cd hwmon4/

/s/class/hwmon/hwmon4🔒 on 🅰 (us-west-2)
❯ cat name
thinkpad

/s/class/hwmon/hwmon4🔒 on 🅰 (us-west-2)
❯ ls -l
lrwxrwxrwx root root   0 B  Wed Aug  4 15:22:44 2021  device ⇒ ../../../thinkpad_hwmon
.r--r--r-- root root 4.0 KB Wed Aug  4 15:22:44 2021  fan1_input
.r--r--r-- root root 4.0 KB Wed Aug  4 15:22:44 2021  fan2_input
.r--r--r-- root root 4.0 KB Wed Aug  4 15:24:14 2021  name
drwxr-xr-x root root   0 B  Wed Aug  4 15:22:44 2021  power
.rw-r--r-- root root 4.0 KB Wed Aug  4 15:22:44 2021  pwm1
.rw-r--r-- root root 4.0 KB Wed Aug  4 15:22:44 2021  pwm1_enable
lrwxrwxrwx root root   0 B  Wed Aug  4 15:22:44 2021  subsystem ⇒ ../../../../../class/hwmon
.r--r--r-- root root 4.0 KB Wed Aug  4 15:22:44 2021  temp1_input
.r--r--r-- root root 4.0 KB Wed Aug  4 15:22:44 2021  temp2_input
.r--r--r-- root root 4.0 KB Wed Aug  4 15:22:44 2021  temp3_input
.r--r--r-- root root 4.0 KB Wed Aug  4 15:22:44 2021  temp4_input
.r--r--r-- root root 4.0 KB Wed Aug  4 15:22:44 2021  temp5_input
.r--r--r-- root root 4.0 KB Wed Aug  4 15:22:44 2021  temp6_input
.r--r--r-- root root 4.0 KB Wed Aug  4 15:22:44 2021  temp7_input
.r--r--r-- root root 4.0 KB Wed Aug  4 15:22:44 2021  temp8_input
.rw-r--r-- root root 4.0 KB Wed Aug  4 15:22:44 2021  uevent
vmatare commented 3 years ago

Huh. OK, took a closer look at the logs and I'm missing the part where your system goes to sleep. Specifically this log message should be there:

Going to sleep: Will allow sensor read errors for the next 4 loops.

Also what version of thinkfan are you running?

vmatare commented 3 years ago

To be clear: if I run thinkfan on the commandline and then do a sleep/wakeup cycle, it should log this message. But in syslog you should see it, too (try journalctl -el -u thinkfan):

# ./thinkfan -n -b0

Temperatures(bias): 57(0), 39(0), 34(0), 44(0) -> Fans: 80, 120, 70, 50, 120, 120, 130
Going to sleep: Will allow sensor read errors for the next 4 loops.
Temperatures(bias): 52(0), 41(0), 35(0), 44(0) -> Fans: 80, 110, 70, 50, 120, 120, 130
Received SIGUSR2: Re-initializing fan control.
Temperatures(bias): 51(0), 51(0), 35(0), 44(0) -> Fans: 80, 110, 90, 50, 115, 115, 125
Temperatures(bias): 50(0), 45(0), 35(0), 44(0) -> Fans: 80, 110, 70, 50, 115, 115, 125
Temperatures(bias): 49(0), 40(0), 35(0), 44(0) -> Fans: 50, 110, 70, 50, 115, 115, 125    
dkowis commented 3 years ago

I was away from that computer for a week, I'm back on it and will look for this data to try to get it to you.

dkowis commented 3 years ago

I captured some thinkfan logs from the most recent suspend

Aug 18 13:27:19 poppad thinkfan: Temperatures(bias): 34(0), 29(0) -> level 0
Aug 18 14:02:01 poppad systemd[1]: Starting Notify thinkfan of imminent sleep...
Aug 18 14:02:02 poppad systemd[1]: thinkfan-sleep.service: Succeeded.
Aug 18 14:02:02 poppad systemd[1]: Finished Notify thinkfan of imminent sleep.
Aug 19 08:52:39 poppad systemd[1]: Starting Reload thinkfan after waking up from suspend...
Aug 19 08:52:39 poppad systemd[1]: thinkfan-wakeup.service: Succeeded.
Aug 19 08:52:39 poppad systemd[1]: Finished Reload thinkfan after waking up from suspend.
Aug 19 08:52:39 poppad thinkfan: ERROR: Lost sensor read_temps: Failed to read temperature(s) from /sys/class/hwmon/hwmon4/temp1_input: No such device or address
Aug 19 08:52:39 poppad systemd[1]: thinkfan.service: Main process exited, code=exited, status=1/FAILURE
Aug 19 08:52:39 poppad systemd[1]: thinkfan.service: Failed with result 'exit-code'.
Aug 19 08:52:42 poppad systemd[1]: thinkfan.service: Scheduled restart job, restart counter is at 5.
Aug 19 08:52:42 poppad thinkfan: ERROR: read_temps: Failed to read temperature(s) from /sys/class/hwmon/hwmon4/temp1_input: No such device or address
Aug 19 08:52:42 poppad systemd[1]: thinkfan.service: Control process exited, code=exited, status=1/FAILURE
Aug 19 08:52:42 poppad systemd[1]: thinkfan.service: Failed with result 'exit-code'.
Aug 19 08:52:46 poppad systemd[1]: thinkfan.service: Scheduled restart job, restart counter is at 6.
Aug 19 08:52:46 poppad thinkfan: Temperatures(bias): 48(0), 30(0) -> level 1
vmatare commented 3 years ago

OK, so the problem is clear: thinkfan somehow isn't receiving the SIGWINCH signal when going to sleep. However I can't tell why that is. Could you please post the output of systemctl cat thinkfan-sleep.service? Also, what version of thinkfan are you running (thinkfan -h)?

vmatare commented 2 years ago

Closing for lack of info. Feel free to reopen if info comes up.

vitaly-zdanevich commented 3 months ago

On my system I found that thinkfan is not working after sleep wakeup. Do you need some logs? I am on Gentoo Linux, using OpenRC.

vitaly-zdanevich commented 2 months ago

In logs after system (Gentoo) wakeup I see:

ERROR: Lost sensor readtemps: Failed to read temperature(s) from /sys/class/hwmon/hwmon1/temp1_input: No such device or address