networkupstools / nut

The Network UPS Tools repository. UPS management protocol Informational RFC 9271 published by IETF at https://www.rfc-editor.org/info/rfc9271 Please star NUT on GitHub, this helps with sponsorships!
https://networkupstools.org/
Other
1.92k stars 345 forks source link

Can't power off ups device from Truenas #2587

Open LevAnni777 opened 1 month ago

LevAnni777 commented 1 month ago

Guys, my APC BackUps ES 525 isn't powering off by Truenas OS, but turns off when connected to my laptop using PowerChute software.

Shell log shows this error when I manually try to power it of using command upsdrvctl:

root@TRUENAS[/home/admin]# /sbin/upsdrvctl -d shutdown Network UPS Tools - UPS driver controller 2.8.0 Network UPS Tools - Generic HID driver 0.47 (2.8.0) USB communication driver (libusb 1.0) 0.43 Can't claim USB device [051d:0002]@0/0: Entity not found Driver failed to start (exit status=1)

Any ideas what is wrong here?

Driver seems like connected because I see results in upsc ups:

root@TRUENAS[/home/admin]# upsc ups Init SSL without certificate database battery.charge: 100 battery.charge.low: 10 battery.charge.warning: 50 battery.date: 2024/08/07 battery.mfr.date: 2024/08/07 battery.runtime: 3442 battery.runtime.low: 120 battery.type: PbAc battery.voltage: 13.5 battery.voltage.nominal: 12.0 device.mfr: American Power Conversion device.model: Back-UPS ES 525 device.serial: 8B0708R15568
device.type: ups driver.name: usbhid-ups driver.parameter.pollfreq: 30 driver.parameter.pollinterval: 2 driver.parameter.port: auto driver.parameter.synchronous: auto driver.version: 2.8.0 driver.version.data: APC HID 0.98 driver.version.internal: 0.47 driver.version.usb: libusb-1.0.26 (API: 0x1000109) input.sensitivity: medium input.transfer.high: 255 input.transfer.low: 195 input.voltage: 228.8 input.voltage.nominal: 230 output.voltage.nominal: 2300.0 ups.beeper.status: enabled ups.delay.shutdown: 20 ups.firmware: 851.t3.I ups.firmware.aux: t3 ups.load: 10.0 ups.mfr: American Power Conversion ups.mfr.date: 2007/02/26 ups.model: Back-UPS ES 525 ups.productid: 0002 ups.realpower.nominal: 300 ups.serial: 8B0708R15568
ups.status: OL ups.test.result: No test initiated ups.timer.reboot: 0 ups.timer.shutdown: -1 ups.vendorid: 051d

Thanks

jimklimov commented 1 month ago

Is the regular driver running at that time (and so holding the port)? It should be stopped first.

Note that typically (for production use) the nutshutdown script (e.g. as a hook for systemd-shutdown) handles this in many OSes, to detect the FSD flag file from upsmon and act on it.

LevAnni777 commented 1 month ago

I can't say what OS does exactly, what I certainly know is that the OS shuts down itself and sends /etc/killpower command to the ups to power it off, but nothing happens :(

jimklimov commented 1 month ago

To clarify: /etc/killpower is not a command, but (a typical location of) a touched file; technically the POWERDOWNFLAG configuration option in upsmon.conf.

It is up to later shutdown integration to see this file (e.g. by running upsmon -K) and decide to upsdrvctl shutdown the connected UPSes.

This is often entangled with the power-race-avoidance loop (sleep for an hour or so and reboot, in case the UPS powerdown does not work, perhaps because wall power returns, so the servers eventually return to service and are not halted indefinitely.

One example of the ritual is here: https://github.com/networkupstools/nut/blob/master/scripts/systemd/nutshutdown.in

Some others can be seen nearby for old init scripts, at least regarding the upsdrvctl shutdown part; some systems' lifecycle prevents userland processes like that long sleep from running after shutdown has started.

LevAnni777 commented 1 month ago

Thanks for you help!!

Actually, very strange thing is happening in the truenas.

When the system boots up and I try to run upsdrvctl shutdown command, as I said earlier, I see this error:

root@TRUENAS[/home/admin]# /sbin/upsdrvctl -d shutdown Network UPS Tools - UPS driver controller 2.8.0 Network UPS Tools - Generic HID driver 0.47 (2.8.0) USB communication driver (libusb 1.0) 0.43 Can't claim USB device [051d:0002]@0/0: Entity not found Driver failed to start (exit status=1)

But if I restart the service by entering this commands:

/sbin/upsdrvctl -d stop

followed by

/sbin/upsdrvctl -d start

then the /sbin/upsdrvctl -d shutdown command powers off the ups perfectly!

I got an idea to first restart the service this way upon system boot up and then see what happens if I pull off the power cable, but this didn't help :( ups stays on.

jimklimov commented 1 month ago

As I noted before, the upsdrvctl shutdown right off the bat probably fails because an earlier driver instance is running, e.g. wrapped by systemd thanks to NDE linked above. Can you confirm if your TrueNAS instance has systemd to be concerned about this code path? In fact, is this the Linux or BSD variant of the appliance OS?

These service runs may be configured to not use a PID file, so competing manually-started drivers do not know they have a sibling to kill off, and then the device is busy. Or the PID file is used, they kill off the sibling, and systemd revives it (possibly killing the program you've started).

Better not to mix the two management approaches, or have one explicitly stopped/disabled before trying the other.

The manually-run upsdrvctl -d start probably ends up in the daemon having the PID file so the copy started by upsdrvctl -d shutdown can deal with it.

Newer NUT releases add the ability for drivers to interact via local socket they have to talk to upsd data server, so they do not have to rely on PID files and signals alone, to the point (planned) that a still running and connected driver could be told to initiate the UPS shutdown instead of just dying and letting the other copy of the program to re-initialize the connection and tell the UPS to power-cycle etc.

LevAnni777 commented 1 month ago

I have Truenas Scale, it's linux based Debian OS.

the NUT server daemon is running, and yes, most likely you are right about that I'm just running another (New) instance when I stop and start the service.

definitely something is wrong at the shutdown process not giving a proper power off command to UPS, as I can do it manually.

Right now I'm working on writing "dirty" script which will be run at the end of shutdown process to manually power off the UPS. I just ran out of options :(

jimklimov commented 1 month ago

In this case check the systemd-shutdown hooks and installation of the nutshutdown script mentioned above. Perhaps just adapt the current iteration (the file in source is a template waiting for paths to be substituted) and place into /usr/lib/systemd/system-shutdown/nutshutdown executable file.

LevAnni777 commented 1 month ago

Thanks

Unfortunately this file as a read only file system, I can't make changes in it.

jimklimov commented 4 weeks ago

I wonder if it is an OS image (initrd, ISO, etc.) that can be custom-generated? As a GPL-based product, it should offer ways (docs, hopefully even recipes and tools) to do so.

A custom build may be "not supported" by the vendor though.

jimklimov commented 4 weeks ago

In the worst case, as far as NUT goes, you can tie sending an UPS command to e.g. "power off X minutes from now" when user-space level services are stopping (e.g. add a systemd unit "RequiredBy" shutdown.target or some such, and possibly "Wants" some long-stopping service to die first), so after this level of scripts has run and the NAS system may linger some more waiting for its clients to disconnect, or VMs to stop, etc., eventually the UPS will turn off or power-cycle.

That relies more on capabilities of its firmware/hardware (does it allow delayed power-off? how does it react to wall power coming back? ...what about it coming back during your shutdown routine - would the UPS cycle as told anyway? how granular can the time be - e.g. CPS devices often count in whole 60-second chunks), as well as that your guessed delay would suffice for safe shutdown of everyone else.

erlendoyen commented 3 days ago

same issue, truenas scale

it did work at some point (previus scale release, cant figure out when it stopped working) with the same hw and ups.

ups powerwalker

tested drivers: blazer_usb nutdrv_qx

jimklimov commented 3 days ago

Can you please check if a /usr/lib/systemd/system-shutdown/nutshutdown executable file at least exists, and if it looks anything like the one in current NUT sources? Also, is there a line for POWERDOWNFLAG in upsmon.conf there?

erlendoyen commented 3 days ago

script exist:

image

skipped the line with the password upsmon.conf

image

jimklimov commented 3 days ago

So it is a bit older variant of the script, but seems like ours :)

Do you know how your version of TrueNAS Scale unmounts the file systems when shutting down and/or remount them read-only? Maybe that's what changed between releases?

Does /etc exist at the time nutshutdown would run (so upsmon -K would detect the /etc/killpower file, and upsdrvctl shutdown would see the ups.conf definitions)? Also, do /sbin, /lib/nut (assuming Debian-ish packaging) exist with the NUT programs?

An earlier post suggested the OS image is read-only; is /etc persistent and writeable by processes like upsmon? I suppose there is some way to write there as the system integration creates your ups.conf... although I suppose that might be present in (or overlaid from) another factual storage location which disappears during OS un-mounting...

Further, does upsmon run with normal NUT approach, with a root-owned part creating the killpower file and calling shutdown, and the unprivileged part doing most of the work? Perhaps changing POWERDOWNFLAG to a different location that survives until reboot (e.g. /run/nut/killpower) might help, if the default /etc is not easily writeable now.

erlendoyen commented 2 days ago

1.yes /etc should exist.

2.process details (edited) can't say that i see why it's running with 2 instances

image

image

  1. i tested adding POWERDOWNFLAG to ups.conf to /home/admin/killpower but it did not change anything i will test with a different location as well.
jimklimov commented 2 days ago

It is created when upsmon begins the shutdown rituals, if the filename is configured at all (is in your system), and would contain some magic string to make sure it is ours/intentional (not an empty touch-flag file). I think it should be removed during start-up of NUT services, at least it was in init-script times (did not check recent packages about it; maybe upsmon start-up deals with it now directly). Keeping it on a tmpfs also serves that purpose (disappears upon reboot).

The critical part is that if the system unmounts filesystems as part of shutdown/reboot, this file should exist on those core filesystems that remain (may be read-only) for the late-shutdown hook integrations to see it and potentially immediately cut the UPS power (timer delays depend on UPS HW/FW, may be absent).

erlendoyen commented 1 day ago

hi

any tips to what more i can test? i have attempted to create a support ticket with Ix System, but the policy is to not support any hw issues without either a support plan or on iX HW

i reached out to them again to see if they can take a look because i don't think this is a isolated case, and i know it worked before in truenas scale, i cant figure out exactly what build it was

jimklimov commented 1 day ago

Given that it seems you can't modify the OS image or built another one, options are somewhat limited.

First, I'd investigate how to customize it after all, should be possible with a FOSS-based product I suppose. If feasible, add a script (or extend nutshutdown) to report currently mounted filesystems, presence of key files (killpower, ups.conf), etc.

Orthogonally to that, see if you can start the kernel with a "netconsole" module/option, so you get a syslog-like UDP stream with kernel/service messages that the system logs, and another computer with a syslog sink can collect those so you can look for clues about shutdown routines of that distro - what it does in practice.

Similarly, if there's a serial port (or its IPMI, ILOM, SOL, etc. emulator), you can attach to that as a system console.

For tests it may suffice to run a VM with a shutdown requested by a dummy-ups driver, just so you can see what the virtual OS does. A serial port console may also be easier used in VM than physically nowadays.