vmatare / thinkfan

The minimalist fan control program
GNU General Public License v3.0
552 stars 62 forks source link

1.2.2: Endless loop on non-existent hwmon index #145

Closed lakotamm closed 3 years ago

lakotamm commented 3 years ago

Hello. A few weeks ago, the thinkfan fan control stopped working on my laptop. Today I finally found time to investigate the error. Here are my findings:

Symptoms: The fan control simply does not run. In the best case, the program reports temperature once and stays there, requiring a restart to report anything else. In the worst case (when I choose a non-existing sensor in the .yaml in the config) in straight out freezes.

The issue gets resolved when I compile and install and earlier version of thinkfan.

Thinkfan version: 1.2.2 Tested functioning earlier versions: 1.2.1, 1.1, Tested effected kernels: 5.13 (Fedora 34, Manjaro), 5.12 (Fedora 34, Manjaro), 5.4 (Manjaro) Laptop: T490 with BIOS 1.53, i7-8565U and dGPU MX250

I am having hard times finding any debugging information to provide. systemctl reports: Jul 29 01:10:31 fedora systemd[1]: Starting Thinkfan, the minimalist fan control program... My config is:

sensors:
  - hwmon: /sys/devices/platform/coretemp.0/hwmon
    indices: [2, 3, 4, 5]

levels:
  - [0, 0, 55]
  - [1, 48, 62]
  - [2, 55, 69]
  - [3, 60, 74]
  - [4, 65, 80]
  - [5, 71, 84]
  - [6, 77, 89]
  - [7, 82, 32767]

Chanking the indices to indices: [1,2, 3, 4, 5, 6] (6 does not exists) results in a complete freeze of the process, which cannot be ended in a normal way.

lakotamm commented 3 years ago

This issue may be related to #137 and #139.

vmatare commented 3 years ago

Thanks a lot for this report, especially the crucial bit about the non-existing sensor index. I think I can see the problem by looking at the code already...

vmatare commented 3 years ago

This is a pretty stupid bug, but I'm fairly certain it's not related to #137. Might be related to #139, but unfortunately that one is a little trickier to reproduce.

lakotamm commented 3 years ago

Cool! I am happy if this helps. Just let me know if there is something I should test.

vmatare commented 3 years ago

I've pushed a fix to the master branch. @lakotamm go ahead and try it out, it should give a helpful error message now if you have a wrong hwmon index in your config.

lakotamm commented 3 years ago

The error message works well! Thank you!

I do not know whether this fix was supposed to fix also the other issue of thinkfan not working when the configuration is correct, but that bug is still present. The fan control is still not working.

latest Master (does not control the fan)

● thinkfan.service - Thinkfan, the minimalist fan control program Loaded: loaded (/usr/local/lib/systemd/system/thinkfan.service; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/thinkfan.service.d └─default.conf, override.conf Active: active (running) since Mon 2021-08-02 00:19:48 EEST; 3s ago Process: 50481 ExecStart=/usr/local/sbin/thinkfan $THINKFAN_ARGS (code=exited, status=0/SUCCESS) Main PID: 50482 (thinkfan) Tasks: 1 (limit: 28377) Memory: 468.0K CPU: 9ms CGroup: /system.slice/thinkfan.service └─50482 /usr/local/sbin/thinkfan -b0

Aug 02 00:19:48 fedora systemd[1]: Starting Thinkfan, the minimalist fan control program... Aug 02 00:19:48 fedora systemd[1]: Started Thinkfan, the minimalist fan control program. Aug 02 00:19:48 fedora thinkfan[50482]: Temperatures(bias): 59(0), 57(0), 59(0), 57(0), 58(0) -> Fans [miroslav@fedora build]$

v. 1.2.1 (works well):

● thinkfan.service - Thinkfan, the minimalist fan control program Loaded: loaded (/usr/local/lib/systemd/system/thinkfan.service; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/thinkfan.service.d └─default.conf, override.conf Active: active (running) since Mon 2021-08-02 00:17:14 EEST; 3s ago Process: 50295 ExecStart=/usr/local/sbin/thinkfan $THINKFAN_ARGS (code=exited, status=0/SUCCESS) Main PID: 50296 (thinkfan) Tasks: 1 (limit: 28377) Memory: 468.0K CPU: 8ms CGroup: /system.slice/thinkfan.service └─50296 /usr/local/sbin/thinkfan -b0

Aug 02 00:17:14 fedora systemd[1]: Starting Thinkfan, the minimalist fan control program... Aug 02 00:17:14 fedora thinkfan[50295]: /proc/acpi/ibm/fan: Restoring initial state: auto. Aug 02 00:17:14 fedora thinkfan[50295]: Daemon PID: 50296 Aug 02 00:17:14 fedora systemd[1]: Started Thinkfan, the minimalist fan control program. Aug 02 00:17:14 fedora thinkfan[50296]: WARNING: Using default fan control in /proc/acpi/ibm/fan. Aug 02 00:17:14 fedora thinkfan[50296]: /proc/acpi/ibm/fan: Saved initial state: auto. Aug 02 00:17:14 fedora thinkfan[50296]: Temperatures(bias): 58(0), 57(0), 58(0), 57(0), 57(0) -> level 1 Aug 02 00:17:16 fedora thinkfan[50296]: Temperatures(bias): 58(0), 57(0), 58(0), 57(0), 57(0) [miroslav@fedora build]$

vmatare commented 3 years ago

Sorry about the delay, took some effort to actually reproduce the part where thinkfan simply does nothing. However now I did and I'm in the process of fixing it. For the time being, you can try this workaround:

Add an explicit fans: section to your config (you've apparently been relying on the default fan config using /proc/acpi/ibm/fan):

fans:
  - tpacpi: /proc/acpi/ibm/fan

So this should change nothing for you, it's just the default made explicit.

vmatare commented 3 years ago

This should now be taken care of by https://github.com/vmatare/thinkfan/commit/fce52ac4e865d9a616865f33ca78f3d327e37657. I've added a requirement for people to always configure their fans explicity. The old behavior of defaulting to /proc/acpi/ibm/fan just isn't appropriate any more. @lakotamm please test and let me know whether the new behavior makes sense to you.

lakotamm commented 3 years ago

Sorry about the delay, took some effort to actually reproduce the part where thinkfan simply does nothing. However now I did and I'm in the process of fixing it. For the time being, you can try this workaround:

Add an explicit fans: section to your config (you've apparently been relying on the default fan config using /proc/acpi/ibm/fan):

fans:
  - tpacpi: /proc/acpi/ibm/fan

So this should change nothing for you, it's just the default made explicit.

Thanks for the great news! :-)

Unfortunately when I try to use it, I encounter an issue:

Sep 01 10:12:32 fedora systemd[1]: Starting Thinkfan, the minimalist fan control program... Sep 01 10:12:32 fedora thinkfan[16438]: ERROR: Module thinkpad_acpi doesn't seem to support fan_control Sep 01 10:12:32 fedora systemd[1]: thinkfan.service: Control process exited, code=exited, status=1/FAILURE Sep 01 10:12:32 fedora systemd[1]: thinkfan.service: Failed with result 'exit-code'. Sep 01 10:12:32 fedora systemd[1]: Failed to start Thinkfan, the minimalist fan control program.

vmatare commented 3 years ago

Sep 01 10:12:32 fedora thinkfan[16438]: ERROR: Module thinkpad_acpi doesn't seem to support fan_control

Yes, you have to load the thinkpad_acpi module with the option fan_control=1. See the manpage modprobe.d(5).

lakotamm commented 3 years ago

Thanks fot your reply.

Unfortunately somehow I can no longer even get to that message. Now thinkfan says that I do not even have fans set up (which is not true?)

[miroslav@fedora build]$ sudo thinkfan -n -c /etc/thinkfan.yaml 
ERROR: No fans are configured in /etc/thinkfan.yaml
[miroslav@fedora build]$ cat /etc/thinkfan.yaml 
sensors:
  - hwmon: /sys/devices/platform/coretemp.0/hwmon
    indices: [2, 3, 4, 5]

levels:
  - [0, 0, 55]
  - [1, 48, 62]
  - [2, 55, 69]
  - [3, 60, 74]
  - [4, 65, 80]
  - [5, 71, 84]
  - [6, 77, 89]
  - [7, 82, 32767]

fans:
  - tpacpi: /proc/acpi/ibm/thermal
[miroslav@fedora build]$ 

I tried deleting the file and creating a new one, reinstalling thinkfan, manually specifying the conf. file... It does not seem to help...

vmatare commented 3 years ago

Thanks for testing! I've reproduced the bug and I'm now in the process of fixing it. As a temporary workaround, you can put the fans: section before the levels: section. Of course it should also function the way you did it, but due to some changes in the way the config is read, the ordering you have there is currently broken.

lakotamm commented 3 years ago

Thanks, when I put fans in front of levels, it no longer shows that error message.

However I am still having issues with thethinkpad_acpi module. Maybe I should create a new issue for it though.

I tried:

but I still get the same error message

ERROR: Module thinkpad_acpi doesn't seem to support fan_control

and if I try to see whether the module is set up correctly, it does not seem to be:

[miroslav@fedora ~]$ cat /sys/modules/thinkpad_acpi/parameters/fan_control cat: /sys/modules/thinkpad_acpi/parameters/fan_control: No such file or directory

[miroslav@fedora ~]$ lsmod | grep thinkpad thinkpad_acpi 126976 0 platform_profile 16384 1 thinkpad_acpi ledtrig_audio 16384 4 snd_ctl_led,snd_hda_codec_generic,snd_sof,thinkpad_acpi snd 110592 26 snd_ctl_led,snd_hda_codec_generic,snd_seq,snd_seq_device,snd_hda_codec_hdmi,snd_hwdep,snd_hda_intel,snd_hda_codec,snd_hda_codec_realtek,snd_timer,snd_compress,thinkpad_acpi,snd_soc_core,snd_pcm rfkill 32768 8 bluetooth,thinkpad_acpi,cfg80211 video 57344 3 thinkpad_acpi,i915,nouveau

I am using kernel 5.13.13-200.fc34.x86_64.

And of course manually setting the fans to:

fans:

  • tpacpi: /proc/acpi/ibm/fan

works.

vmatare commented 3 years ago

OK, while developing the fix I found that I had a severe typo in my comments above. I've edit them so that other users reading this don't get confused, too. So I wrote:

fans:
  - tpacpi: /proc/acpi/ibm/thermal # This is WRONG

But that is of course incorrect, and causes the error message "Module thinkpad_acpi doesn't seem to support fan_control" because that's thermal sensor input file, not the fan control. So the correct config is (as you apparently noticed):

fans:
  - tpacpi: /proc/acpi/ibm/fan # correct
vmatare commented 3 years ago

I've now pushed a fix the master branch that should allow you to put your fans: and levels: sections in any order, i.e. you can now have the fans: section at the end, as well. With that, all of the issues you found should be fixed.

lakotamm commented 3 years ago

I can confirm that everything works well right now.

One thing to note is that I need to have options thinkpad_acpi fan_control=1 enabled in /etc/modprobe.d/thinkfan.conf. It could be worth adding this information somewhere visibly (unless that setting gets enabled automatically). (Tested in Fedora 34)

lakotamm commented 3 years ago

I do not know what happened, but suddenly after pulling master and recompiling, I am getting the same error again.

× thinkfan.service - Thinkfan, the minimalist fan control program Loaded: loaded (/usr/local/lib/systemd/system/thinkfan.service; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/thinkfan.service.d └─default.conf, override.conf Active: failed (Result: exit-code) since Tue 2021-09-21 00:04:09 CEST; 18s ago Process: 9586 ExecStart=/usr/local/sbin/thinkfan $THINKFAN_ARGS (code=exited, status=1/FAILURE) CPU: 3ms

Sep 21 00:04:09 fedora systemd[1]: Starting Thinkfan, the minimalist fan control program... Sep 21 00:04:09 fedora thinkfan[9586]: ERROR: No fans are configured in /etc/thinkfan.yaml Sep 21 00:04:09 fedora systemd[1]: thinkfan.service: Control process exited, code=exited, status=1/FAILURE Sep 21 00:04:09 fedora systemd[1]: thinkfan.service: Failed with result 'exit-code'. Sep 21 00:04:09 fedora systemd[1]: Failed to start Thinkfan, the minimalist fan control program.

no matter where I put the "fans" section, I get this error.

vmatare commented 3 years ago

Sorry, another regression that came with some refactoring I did. The YAML parser is in bad need of a fundamental architecture overhaul. It has also been reported at #155, so I'm tracking it there because that's specific to this particular regression. Let me close this since the original issue(s) have long been fixed and it's getting a little cramped in here (thematically) ;-)