vmatare / thinkfan

The minimalist fan control program
GNU General Public License v3.0
541 stars 62 forks source link

hwmon fails to find "temp1_input" for /sys/devices/virtual/thermal #182

Closed omersi closed 1 year ago

omersi commented 2 years ago

Hi

I create this thinkfan.conf file


# All core sensors + ACPI
sensors:
  # Chassis
  - hwmon: /sys/devices/platform/thinkpad_hwmon/hwmon
    indices:  [1,2,3,4,5,6,7]
  # Core
  - hwmon: /sys/devices/platform/coretemp.0/hwmon
    indices: [1,2,3,4,5]
    correction: [-5, -5, -5, -5, -5]
  # SSD
  - hwmon: /sys/devices/pci0000:00/0000:00:06.0/0000:04:00.0
    name: nvme
    indices: [1,2,3]
    correction: [-5, -5, -5]
  # GPU / CPU
  - hwmon: /sys/devices/virtual/thermal
    indices: [1]
#  - hwmon: /sys/devices/virtual/thermal/thermal_zone0/hwmon1/temp1_input
#  - hwmon: /sys/devices/virtual/thermal/thermal_zone7/hwmon5/temp1_input

# Use tpacpi to allow disengage mode (boost)
fans:
  - tpacpi: /proc/acpi/ibm/fan

upon restart, the deamon faild to initilize itself with exception regarding to the CPU/GPU monitors:

● thinkfan.service - thinkfan 1.3.1
     Loaded: loaded (/usr/local/lib/systemd/system/thinkfan.service; enabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system/thinkfan.service.d
             └─override.conf
     Active: failed (Result: exit-code) since Thu 2022-04-07 09:17:40 IDT; 4s ago
    Process: 7498 ExecStart=/usr/local/sbin/thinkfan $THINKFAN_ARGS (code=exited, status=1/FAILURE)

Apr 07 09:17:40 omerl-ilubuntu.ilient-hq.local systemd[1]: Starting thinkfan 1.3.1...
Apr 07 09:17:40 omerl-ilubuntu.ilient-hq.local thinkfan[7498]: ERROR: /etc/thinkfan.conf:16:
                                                                 - hwmon: /sys/devices/virtual/thermal
                                                                   ^
                                                               Could not find an `hwmon*' directory or `temp*_input' >
Apr 07 09:17:40 omerl-ilubuntu.ilient-hq.local systemd[1]: thinkfan.service: Control process exited, code=exited, sta>
Apr 07 09:17:40 omerl-ilubuntu.ilient-hq.local systemd[1]: thinkfan.service: Failed with result 'exit-code'.
Apr 07 09:17:40 omerl-ilubuntu.ilient-hq.local systemd[1]: Failed to start thinkfan 1.3.1.

only that running find command yields

$ find /sys/devices/virtual/thermal -name "temp*_input"
/sys/devices/virtual/thermal/thermal_zone0/hwmon1/temp1_input
/sys/devices/virtual/thermal/thermal_zone8/hwmon6/temp1_input

how do I overcome this issue?

vmatare commented 2 years ago

Hi @omersi, what exactly do you mean by "upon restart"? If it's restart of the whole system (i.e. on boot), you're seeing issue #118. Go take a look there for workarounds. A proper fix for it (error-tolerance on startup) is currently in implementation in PR #177. You can also build and test that branch if you want to help.

However, if the problem doesn't occur only on boot, but anytime you execute e.g. systemctl restart thinkfan, then this is a new issue.

vmatare commented 2 years ago

Since #118 appears to become more pressing recently with boot sequence nondeterminism increasing, I've simply merged #177 into the master. So you can go ahead and test with the latest master branch by adding a setting like max_errors: 5 to the affected sensors, e.g.:

  # GPU / CPU
  - hwmon: /sys/devices/virtual/thermal
    indices: [1]
    max_errors: 5
omersi commented 2 years ago

@vmatare I meant upon boot. trying the new version.

omersi commented 2 years ago

I installed latest version.

it's failing to start now.

● thinkfan.service - thinkfan 2.0.0
     Loaded: loaded (/usr/local/lib/systemd/system/thinkfan.service; enabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system/thinkfan.service.d
             └─default.conf, override.conf
     Active: failed (Result: exit-code) since Thu 2022-04-14 09:59:49 IDT; 2min 20s ago
    Process: 175207 ExecStart=/usr/local/sbin/thinkfan $THINKFAN_ARGS (code=exited, status=1/FAILURE)

Apr 14 09:59:49 omerl-ilubuntu.ilient-hq.local systemd[1]: Starting thinkfan 2.0.0...
Apr 14 09:59:49 omerl-ilubuntu.ilient-hq.local thinkfan[175207]: ERROR: /run/thinkfan.pid already exists. Either thinkfan is already running, or it was killed by SIGKILL. If you're sure thinkfan is not running, delete /run/thinkfan.pid manually.
Apr 14 09:59:49 omerl-ilubuntu.ilient-hq.local systemd[1]: thinkfan.service: Control process exited, code=exited, status=1/FAILURE
Apr 14 09:59:49 omerl-ilubuntu.ilient-hq.local systemd[1]: thinkfan.service: Failed with result 'exit-code'.
Apr 14 09:59:49 omerl-ilubuntu.ilient-hq.local systemd[1]: Failed to start thinkfan 2.0.0.

/run/thinkfan.pid doesn't exist

sudo ls /run/thinkfan.pid
ls: cannot access '/run/thinkfan.pid': No such file or directory
vmatare commented 2 years ago

That's weird, nothing was changed regarding the PID file. Could you do an strace to see what's actually going on with the PID file and send me the output?

strace -e trace=file ./thinkfan -n |& grep -C10 '\.pid'
vmatare commented 2 years ago

If it happens to work with the -n option, you can try allowing thinkfan to fork and tell strace to follow:

strace -f -e trace=file ./thinkfan |& grep -C10 '\.pid'
vmatare commented 1 year ago

Closing due to lack of info. Feel free to open if info comes up.