vmatare / thinkfan

The minimalist fan control program
GNU General Public License v3.0
552 stars 62 forks source link

thinkfan may fail after wakeup from suspend #36

Closed Konfekt closed 4 years ago

Konfekt commented 7 years ago

After hibernating, thinkfan exits because of

thinkfan[994]: /sys/class/hwmon/hwmon0/device/temp1_input: No such device or address
thinkfan[994]: A sensor has vanished! Exiting since there's no safe way of handling this.
thinkfan[994]: Cleaning up and resetting fan control.
systemd[1]: thinkfan.service: Main process exited, code=exited, status=1/FAILURE
systemd[1]: thinkfan.service: Unit entered failed state.
systemd[1]: thinkfan.service: Failed with result 'exit-code'.

A second later, the sensor is there again. Can this check by thinkfan after hibernation be delayed?

The README recommends to restart thinkfan after suspend. There's a script

https://github.com/vmatare/thinkfan/blob/master/rcscripts/systemd/thinkfan-wakeup.service

for systemd, but it does not work. As a layman, it looks like the command ExecStart=/usr/bin/pkill -usr2 thinkfan kills thinkfan, but it indeed restarts thinkfan?

diggit commented 7 years ago

Hi, you need to enable thinkfan service (systemctl enable thinkfan), then the wakeup action is enabled to execute on wake.

pkill -usr2 thinkfan sends usr2 signal to thinkfan process which results in reload. This is thinkfan reaction to wakeup event: Received SIGUSR2: Re-initializing fan control. Nothing like this in your log...

Konfekt commented 7 years ago

Thank you. The service is enabled. For the moment

[Unit]
Description=Resume simple and lightweight fan control program after suspend
After=suspend.target hibernate.target hybrid-sleep.target

[Service]
Type=oneshot
ExecStartPre=/usr/bin/sleep 3
ExecStart=/usr/sbin/thinkfan -q -n

[Install]
WantedBy=suspend.target hibernate.target hybrid-sleep.target

resolves the issue. But pkill -usr2 thinkfan instead of /usr/sbin/thinkfan -q -n would be the cleaner solution? Thing is, the thinkfan process already exited after resume by the above vanished sensor, so the pkill signal does not arrive at all.

vmatare commented 7 years ago

Well, I've heard about this problem a few times now. All the the pkill -usr2 does is re-enable userspace fan control after wakeup from suspend. So @Konfekt is right in that it doesn't prevent thinkfan from exiting if a sensor 'vanishes'. Off the top of my hat, one solution might be to have a grace period of one cycle (i.e. 5 seconds by default) where missing sensors will be tolerated iff a SIGUSR2 is received within that cycle.

Konfekt commented 7 years ago

Does a udev rule similar to that of https://bbs.archlinux.org/viewtopic.php?id=184483 for thermald help? Something like

ACTION=="add", SUBSYSTEM=="platform" DRIVERS=="thinkpad_hwmon", ATTR{pwm1_enable}="1", ATTR{pwm1}="128", TAG+="systemd", ENV{SYSTEMD_WANTS}="thinkfan.service"
tanius commented 5 years ago

Another workaround, mentioned by @vmatare here in a mailing list in 2012, is this:

please try running thinkfan in Dangerous mode (-D). This makes it more error tolerant, but bear in mind that this also applies to misconfiguration. Otherwise, -D does not change the behaviour, it only affects error handling, so if you change your config, test it without -D before relying on it.

That workaround still works. I just tested it with thinkfan 0.9.3 on Ubuntu 19.10 on a ThinkPad X201 Tablet, by changing the daemon default arguments line in /etc/default/thinkfan to:

DAEMON_ARGS="-q -D"

After a sudo service thinkfan restart, the issue appears solved.

sarmbruster commented 4 years ago

I've suffered from the same issue. After a resume thinkfan aborted since /sys/devices/platform/thinkpad_hwmon/hwmon/hwmon4/temp1_input is temporarily not bound. I've fixed it by using the auto-restart feature of systemd. My /lib/systemd/system/thinkfan.service has now in the [Service] section:

Restart=on-failure
RestartSec=10s

Once thinkfan crashes after resume it's automatically restarted after 10 secs.

Konfekt commented 4 years ago

@tanius Perhaps -D is a red herring, as -D removes all (sanity) checks. Despite being dangerous, as the man page says, it might cause thinkfan to start, but perhaps work incorrectly. The solution by @sarmbruster is less experimental and seems to include what my solution was, running thinkfan again after the computer resumed by an additional systemd service.

manoj153 commented 4 years ago

systemd restart seems to help with hibernate issue for me as well

Restart=on-failure RestartSec=10s

vmatare commented 4 years ago

I'd generally not recommend blindly restarting thinkfan. If it fails for any reason other than a not-yet initialized sensor driver, you'll get really bad behaviour. These issues should be fixed properly in the current master branch now. That is, by having correct dependencies in the systemd units. Please try out the current master and open a new issue if you're still seeing these problems.

AlexOwen commented 4 years ago

I am still getting the following error on master after resuming from sleep on a T480 on Arch (5.6.4-arch1-1), which I think may be the same or related:

Apr 17 00:22:12 Thinkpad thinkfan[1716]: ERROR: Lost sensor read_temps: Failed to read temperature(s) from /sys/devices/platform/thinkpad_hwmon/hwmon//hwmon7/temp1_input: No such device or address
Apr 17 00:22:12 Thinkpad systemd[1]: thinkfan.service: Main process exited, code=exited, status=1/FAILURE
Apr 17 00:22:12 Thinkpad systemd[1]: thinkfan.service: Failed with result 'exit-code'.
Apr 17 00:22:12 Thinkpad audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=thinkfan comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
Apr 17 00:22:12 Thinkpad kernel: audit: type=1131 audit(1587079332.736:74): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=thinkfan comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'

When I cat /sys/devices/platform/thinkpad_hwmon/hwmon//hwmon7/temp1_input after resuming from sleep, I get 48000 which I think is expected.

AlexOwen commented 4 years ago

It seems to resume fine if I switch from hwmon to tpacpi sensors in thinkfan.conf.

I would rather not use it though as I have had some issues with that before where sometimes it will find 16 sensors and sometimes 11, and one sensor (seems to be the plugged in sensor that is misrepresented as a temperature sensor) in my system shows either 0 or 66 and never in between.

vmatare commented 4 years ago

Thanks for the feedback @AlexOwen. I might have an idea what is happening, but some things are unclear to me:

  1. Do you get this failure consistently (i.e. after each wakeup) or only sporadically?
  2. Is there anything more in the log before the part you have shown?

I'm asking 2. because there should be something about SIGUSR2 and "re-initializing fan control". If that is not happening for you, there might be something wrong with your systemd units, however that will probably not be cause of the problem.

AlexOwen commented 4 years ago
  1. I get it every time I sleep and wake when using hwmon
  2. There's no line that mentions SIGUSR2, but the fans do work, just less efficiently than when thinkfan is working

I've just found the indices option, so that with tpacpi works for me. I'm happy to test any fix you come up with, or provide more debugging/logs if I can.

vmatare commented 4 years ago

I have made some updates to the master branch which should finally take care of this problem. Sorry this is going a little slow, but that's because I don't see this problem on my machines.

Now with https://github.com/vmatare/thinkfan/commit/18ec77ba8ee4326beb85f6eb4263e2b8daa254f4, there are again some major changes to the systemd service files. Thinkfan should now be notified BEFORE going to sleep (in addition to after wakeup), and will accept sensor read errors for two loops after receiving that notification (normally ~10s).

The first caveat is that this works only with systemd. The second caveat is that it uses Unix signals, so the signal handler would race with the sleep/resume, which is why after notifying thinkfan that the system is going to sleep, the actual sleep is delayed by 1 second.

vmatare commented 4 years ago

It is of course possible to solve this cleanly, but for that I'd have to implement some IPC mechanism that allows for proper synchronization of the whole sleep/wakeup process. And that is a larger task for a later time, so for now we'll have to make do with this slightly hacking solution. Feedback is welcome!

vmatare commented 4 years ago

Since no one is complaining I'm going to assume that the latest updates have fixed the problems. Please reopen if you still notice spurious failures on wakeup.

SylvainJuge commented 3 years ago

Hi, sorry to revive an old closed issue, but I just had a very similar behavior with the following setup

Given it's easy for me to reproduce I'd be happy to help.

When using the coretemp.0 sensor for CPU, everything works fine

  - hwmon: /sys/devices/platform/coretemp.0/hwmon
    indices: [4]

When using the thinkpad_hwmon (which seems somehow equivalent), it does not

-  - hwmon: /sys/devices/platform/thinkpad_hwmon/hwmon
-    indices: [1]

The behavior is identical when started as a service with systemd or directly through command line. When trying to use the optional attribute, it does not seem to make any difference.

I get the following output:

/proc/acpi/ibm/fan: Saved initial state: auto.
Temperatures(bias): 38(0) -> level 0
Going to sleep: Will allow sensor read errors for the next 2 loops.
/proc/acpi/ibm/fan: Restoring initial state: auto.
ERROR: Lost sensor read_temps: Failed to read temperature(s) from /sys/devices/platform/thinkpad_hwmon/hwmon/hwmon6/temp1_input: No such device or address

For reference, when testing, I start thinkfan with the following command: sudo thinkfan -n -b -5 -v -c /etc/thinkfan.yaml

Link to my complete configuration file (in the state where it fails before applying the workaround above).

benyaminl commented 2 years ago

Hello, I still face this, and seems it's not failing between the thinkpad-sleep or thinkpad-wakeup, and the service is active, no error if I take using

journalctl _SYSTEMD_UNIT=thinkfan.service

Any idea to debug this?

I'm on Thinkpad X220

The only way to make it work now is to restart manually when wake up.

This only happen after very very very long sleep. I can't reproduce it if, it just sleep and wake up again, it will work as intended

vmatare commented 2 years ago

Try this: https://github.com/vmatare/thinkfan/issues/189#issuecomment-1196011925

It's not released, yet, so you'll have to build the latest master yourself if you want to try it out.