networkupstools / nut

The Network UPS Tools repository. UPS management protocol Informational RFC 9271 published by IETF at https://www.rfc-editor.org/info/rfc9271 Please star NUT on GitHub, this helps with sponsorships!
https://networkupstools.org/
Other
1.99k stars 349 forks source link

Riello: No shutdown.return after fsd #1962

Open Dave2526 opened 1 year ago

Dave2526 commented 1 year ago

Hello, I try to setup my Riello NPW800with nut. It´s working except one "Issue": When the upssched foreces the shutdown of my server, The UPS didn´t shutdown completly and also didn´t restart after the Power is back. The scrpit seem to set set the shutdown.return command but it isn´t working. Could there be a interrupt with the ForcedShutdown Command? If I set the shutdown resturn by "upscmd -u admin -p XXXXXXX UPS shutdown.return" it shut down after my selected 40 Seconds.

I Think it´s a little Thing but I didn´t find it...

Here is the System Journal:

Jun 08 17:27:01 server systemd[1]: Stopped target Timer Units.
Jun 08 17:27:01 server systemd[1]: Stopped target Sound Card.
Jun 08 17:27:01 server systemd[1]: Stopped target Preparation for Remote File Systems.
Jun 08 17:27:01 server systemd[1]: Stopped target Host and Network Name Lookups.
Jun 08 17:27:01 server systemd[1]: Stopped target Graphical Interface.
Jun 08 17:27:01 server systemd[1]: Stopped target Cloud-init target.
Jun 08 17:27:01 server systemd[1]: Removed slice Slice /system/modprobe.
Jun 08 17:27:01 server sshd[878740]: pam_unix(sshd:session): session closed for user mainuser
Jun 08 17:27:01 server systemd[1]: Stopping Session 3 of User mainuser...
Jun 08 17:27:01 server systemd[1]: unattended-upgrades.service: Deactivated successfully.
Jun 08 17:27:01 server systemd-logind[665]: System is powering down.
Jun 08 17:27:01 server systemd[1]: nut-monitor.service: Deactivated successfully.
Jun 08 17:27:01 server systemd-logind[665]: Creating /run/nologin, blocking further logins...
Jun 08 17:26:58 server upsscript[892787]: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Jun 08 17:26:58 server upsscript[892786]: UPS Shutdown+return forced; Sytem Shutdown forced, Goodbye!
Jun 08 17:26:58 server upsscript[892785]: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Jun 08 17:26:58 server upsd[725]: Instant command: admin@127.0.0.1 did shutdown.return on UPS
Jun 08 17:26:56 server upsmon[888170]: Shutdown wird ausgefuehrt, Neustart nach Spannungswiederkehr
Jun 08 17:26:56 server upsmon[888170]: Executing automatic power-fail shutdown
Jun 08 17:26:48 server upsscript[892530]: upssched-script LOW-battery running
Jun 08 17:26:48 server upssched[892527]: Executing command: lowbattery
Jun 08 17:26:48 server upsd[725]: Client admin@127.0.0.1 set FSD on UPS [UPS]
Jun 08 17:26:48 server upsmon[888170]: UPS UPS@localhost Batterie ist fast leer

The upssched script, it must be working, the Log Entries are in the Syslog above

 #! /bin/sh

case $1 in
        onbattery)
                logger -t upsscript "upssched-script onbattery running"
                ;;
        lowbattery)
                logger -t upsscript "upssched-script LOW-battery running"
                sleep 10
                upscmd -u admin -p XXXXXXX UPS@localhost shutdown.return
                logger -t upsscript "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~"
                logger -t upsscript "UPS Shutdown+return forced; Sytem Shutdown forced, Goodbye!"
                logger -t upsscript "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~"
                ;;
        fsdtest)
                logger -t upsscript "ForceShutdown Test running"
                upscmd -u admin -p XXXXXX UPS@localhost shutdown.return
                ;;
esac

The UPS config section:

[UPS]
        driver = "riello_usb"
        port = "auto"
        desc = "Riello NetPower"
        override.ups.delay.shutdown = 40
        override.ups.delay.reboot = 15
jimklimov commented 1 year ago

Not really sure. One idea is that there may be a race between the script, the upssched and upsmon daemons which launched it, the upsd and your NUT driver, and finally the OS managing the services.

Namely, I wonder if during the sleep 10 part (namely at :56 before :58) the system began to go down at upsmon's behest, and so upsd accepted the command but there was no driver running (or it was shutting down so did not talk to the device much).

If possible, try increasing the debug verbosity for the driver so it would be visible if it actually lives long enough to command the UPS, and maybe add a delay between upscmd and ending the script, maybe that would delay upsmon from actually triggering the shutdown a bit, enough for the driver to do its work.

Just guessing at the moment, though...

Dave2526 commented 1 year ago

Thanks, If I set it to

lowbattery)
                logger -t upsscript "upssched-script LOW-battery running"
                upscmd -u admin -p XXXXXXX UPS@localhost shutdown.return
                logger -t upsscript "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~"
                logger -t upsscript "UPS Shutdown+return forced; Sytem Shutdown forced, Goodbye!"
                logger -t upsscript "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~"
                sleep 10
                ;;

It's still the same. I will try to verbose the driver, but at the moment I have no idea how to do it...

jimklimov commented 1 year ago

Verbosity setting depends on NUT version: with 2.8.0 or newer (e.g. custom builds) there are debug_min settings for ups.conf, upsd.conf, upsmon.conf so it suffices just to edit the suitable config and reload/restart the daemon. For 2.7.4 and older releases, you have to edit their init-scripts or systemd units to add -DDDDDD parameter (max practical verbosity is 6) and keep in mind that debug enabled via command-line puts the program into foreground mode (you may have to add & in init scripts, change Type in systemd units, etc..)

UPDATE: Added https://github.com/networkupstools/nut/wiki/Changing-NUT-daemon-debug-verbosity to document the nuances.

(Off-topic: use three or more backticks - same amount to start and end - in markdown to quote blocks of code)

Dave2526 commented 1 year ago

Thanks a lot, I will give it a try but I have no more time this weekend :( What I want to try too is to force a shutdown via timer, perhaps the riello ups can not execute this command at battery low... But I got an 126 exit code so I have to check the permissions first...

Dave2526 commented 1 year ago

Ok,

so after a sudo chmod +x /etc/nut/upsscript and increase debugging i have the following log:

Jun 14 20:13:27 server upsmon[1041980]:  175.026768        get_var: UPS@localhost / status
Jun 14 20:13:27 server upsmon[1041980]:  175.026733        pollups: UPS@localhost
Jun 14 20:13:22 server upsmon[1041980]:  170.026555        Minimum power value: 1
Jun 14 20:13:22 server upsmon[1041980]:  170.026544        Current power value: 1
Jun 14 20:13:22 server upsmon[1041980]:  170.026533        parsing: [TRIM]
Jun 14 20:13:22 server upsmon[1041980]:  170.026523        ups_on_batt: UPS@localhost (no change)
Jun 14 20:13:22 server upsmon[1041980]:  170.026512        parsing: [OB]
Jun 14 20:13:22 server upsmon[1041980]:  170.026493        parse_status: [OB TRIM]
Jun 14 20:13:22 server upsmon[1041980]:  170.026280        get_var: UPS@localhost / status
Jun 14 20:13:22 server upsmon[1041980]:  170.026226        pollups: UPS@localhost
Jun 14 20:13:19 server upsmon[1045948]: Network UPS Tools upsmon 2.7.4
Jun 14 20:13:19 server upssched[1045952]: Failed to connect to parent and failed to create parent: No such file or directory
Jun 14 20:13:19 server upsmon[1045952]: Failed to connect to parent and failed to create parent: No such file or directory
Jun 14 20:13:17 server upsmon[1041980]:  165.026086        Minimum power value: 1
Jun 14 20:13:17 server upsmon[1041980]:  165.026075        Current power value: 1
Jun 14 20:13:17 server upsmon[1041980]:  165.026063        parsing: [TRIM]
Jun 14 20:13:17 server upsmon[1041980]:  165.026050        ups_on_batt: UPS@localhost (no change)
Jun 14 20:13:17 server upsmon[1041980]:  165.026036        parsing: [OB]
Jun 14 20:13:17 server upsmon[1041980]:  165.026007        parse_status: [OB TRIM]
Jun 14 20:13:17 server upsmon[1041980]:  165.025711        get_var: UPS@localhost / status
Jun 14 20:13:17 server upsmon[1041980]:  165.025659        pollups: UPS@localhost
Jun 14 20:13:12 server upsmon[1041980]:  160.025525        Minimum power value: 1
Jun 14 20:13:12 server upsmon[1041980]:  160.025517        Current power value: 1
Jun 14 20:13:12 server upsmon[1041980]:  160.025480        parsing: [TRIM]
Jun 14 20:13:12 server upsmon[1041980]:  160.025221        UPS UPS@localhost ! Batteriebetrieb !
Jun 14 20:13:12 server upsmon[1041980]:  160.025213        do_notify: ntype 0x0001 (ONBATT)
Jun 14 20:13:12 server upsmon[1041980]:  160.025203        ups_on_batt: UPS@localhost (first time)

I don´t know how to get the " Failed to connect to parent and failed to create parent: No such file or directory" away.... It look like the are also permission problems.... The upsscript isn´t doing anything...

:; sudo ls -l /run/nut
total 12
srw-rw---- 1 nut  nut  0 Jun 14 19:36 riello_usb-UPS
-rw-r--r-- 1 nut  nut  4 Jun 14 19:36 riello_usb-UPS.pid
-rw-r--r-- 1 nut  nut  4 Jun 14 19:36 upsd.pid
-rw-r--r-- 1 root root 8 Jun 14 20:10 upsmon.pid

Any Ideas?

jimklimov commented 1 year ago

What is the run-time user for upsmon and upssched? Does it (or its group) have R/W access to /var/run/nut/ or where it is configured to write in your build?

Dave2526 commented 1 year ago

upsmon: nut upsmon: root riello_usb: nut upsd: nut

In upsmon.conf there is no User, with nut, root or my standard mainuser its the same.

Dave2526 commented 1 year ago

if i move the upsscript to /run/nut, the error is away but i get an return code 126 again...

jimklimov commented 1 year ago

Looking at it, if you are using a Debian/Ubuntu-derived package, the upssched pipe for parent/CLI-tool communication would probably be /var/run/nut/upssched.pipe - so does the /var/run/nut directory exist and is it owned accessibly to the daemon?

Note that /var/run is nowadays a symlink to /run which is a tmpfs remade on reboots, so something like systemd-tmpfiles would be in charge of creating the directories by virtue of /usr/lib/tmpfiles.d/nut-common-tmpfiles.conf (naming may vary with packages and/or defaults of different NUT versions).

Alternately a frequent location is /var/state/ups.

UPDATE: For upssched in particular the location may be dictated by PIPEFN (and for that matter LOCKFN may matter too) in your upssched.conf.

Dave2526 commented 1 year ago

transferred it to /var/run/nut,

:; sudo ls -l /var/run/nut
srw-rw---- 1 nut  nut     0 Jun 14 19:36 riello_usb-UPS
-rw-r--r-- 1 nut  nut     4 Jun 14 19:36 riello_usb-UPS.pid
-rw-r--r-- 1 nut  nut     4 Jun 14 19:36 upsd.pid
-rw-r--r-- 1 root root    8 Jun 14 21:10 upsmon.pid
-rwxr-xr-x 1 root root 1019 Jun 14 21:11 upsscript

with nut:nut on upsscript it´s the same.

upssched.conf:

CMDSCRIPT /var/run/nut/upsscript
PIPEFN /var/run/nut/upssched.pipe
LOCKFN /var/run/nut/upssched.lock
Dave2526 commented 1 year ago

OK, next try:

At the moment my riello is on a Intel I5 PC with ubuntu server. Tomorrow i will try it on a Rasberry with RaspberryOS. If it should work, i will check all permissions.