vmatare / thinkfan

The minimalist fan control program
GNU General Public License v3.0
534 stars 61 forks source link

run systemd service with realtime priority #198

Open bymoz089 opened 1 year ago

bymoz089 commented 1 year ago

Run systemd service with realtime priority, to ensure that thinkfan's execution is not delayed in case of sudden rise of CPU load. Deactivated fans on heavy CPU load could otherwise lead to a system freeze.

vmatare commented 1 year ago

Hi @bymoz089, thanks for this interesting contribution. I understand the theoretical reasoning behind it, but realtime scheduling is potentially dangerous, so I have some questions:

  1. Have you actually observed any situations where this was necessary, i.e. where thinkfan was not getting enough CPU time to do its job? Because it needs very little, and even on overloaded systems (i.e. where load average > cores) I've never observed a situation where fan control was negatively affected.
  2. If we're using FIFO scheduling discipline, we have to be careful not to lock up an entire core (or the whole system in the case of single-core) if some bug causes an endless loop. So shouldn't the process be given a limited CPU time budget when we do FIFO?
bymoz089 commented 1 year ago

Hi @vmatare, thanks for considering the PR.

  1. yes. I made some programming mistakes (with threads and loops) resulting in running hot all CPU cores very fast, with thinkfan running and fan not rotating. This resulted in a whole system freeze (two times), fan was not starting. No hardware broke. My guess was, that it froze because of bad cooling. I then experimented with scheduler priorities. I came up with this solution about 6 months ago and had no issues since then. All this on a 10 year old thinkpad (intel core-i).

  2. Thats correct. I choosed a low priority of 20 because of that. 99 is highest priority for fifo. Scheduling is done preemptive, so the lower priority thread will be stalled. An additional measure would be to use RoundRobin (RR) policy, which gives every thread a limited time period for running, before it is rescheduled. - - - - Even tough, I did not test it, in such a situation it should still be possible to kill thinkfan.

I would argument, that it is more important the fan runs (in order to prevent hardware overheating, even if the system freezes) than preventing near-lock-ups because of an unlikely bad-thinkfan-loop on production systems.

Based on this infos: https://man7.org/linux/man-pages/man7/sched.7.html

bymoz089 commented 1 year ago

In case you reject this PR, maybe it is something for the documentation.
It is possible, that the system admin defines a systemd service override, where this realtime scheduling gets enabled.