networkupstools / nut

The Network UPS Tools repository. UPS management protocol Informational RFC 9271 published by IETF at https://www.rfc-editor.org/info/rfc9271 Please star NUT on GitHub, this helps with sponsorships!
https://networkupstools.org/
Other
1.97k stars 348 forks source link

Issues with Bookworm and NUT upscode2 > 2.7.4 #2379

Closed rdukes6514 closed 2 months ago

rdukes6514 commented 6 months ago

I have several older UPSes that communicate serially by either the apcsmart or upscode2 protocol. Each UPS is monitored by a Pi Zero or a PC, connected by rs232. All instances of NUT were fine under Debian or Raspbian Buster and Bullseye using NUT 2.7.4.

The one apcsmart UPS/PC, that has been updated to Bookworm/NUT 2.8.0, is working fine. However the upscode2 UPSes are not communicating. Both the webpage for upsstats.cgi and systemctl status nut-server report "Data for UPS [UPS] is stale - check driver". Restarting with upsdrvctl returns "Missing UPCL after UPCL".

This sound a lot like the PR #1455/#1495 regression issue. I cannot get this to run with anywhere from 1-6 -D's in ups.conf.

I have also tried to wget all the NUT files from Debian Trixie (nut 2.8.1-3 and install with dpkg, but I get the same results - Stale Data and the UPCL errors.

Help?

Thanks Randall Dukes

root@powervar2:~# systemctl status nut-server
● nut-server.service - Network UPS Tools - power devices information server
     Loaded: loaded (/lib/systemd/system/nut-server.service; enabled; preset: enabled)
     Active: active (running) since Mon 2024-04-01 17:01:38 CDT; 1h 50min ago
    Process: 793 ExecStart=/sbin/upsd (code=exited, status=0/SUCCESS)
   Main PID: 799 (upsd)
      Tasks: 1 (limit: 380)
        CPU: 175ms
     CGroup: /system.slice/nut-server.service
             └─799 /lib/nut/upsd

Apr 01 17:01:38 powervar2 upsd[799]: upsnotify: failed to notify about state 2: no notification tech defined, will not spam more about it
Apr 01 17:01:38 powervar2 systemd[1]: Started nut-server.service - Network UPS Tools - power devices information server.
Apr 01 17:01:38 powervar2 upsd[799]: User upsmaster@127.0.0.1 logged into UPS [UPS]
Apr 01 17:01:38 powervar2 upsd[799]: User upsmaster@127.0.0.1 logged into UPS [heartbeat]
Apr 01 17:01:39 powervar2 upsd[799]: Connected to UPS [UPS]: upscode2-UPS
Apr 01 17:01:43 powervar2 upsd[799]: Data for UPS [UPS] is stale - check driver
Apr 01 18:40:50 powervar2 upsd[799]: Data for UPS [heartbeat] is stale - check driver
Apr 01 18:40:50 powervar2 upsd[799]: UPS [heartbeat] data is no longer stale
Apr 01 18:40:58 powervar2 upsd[799]: User upsmaster@127.0.0.1 logged into UPS [UPS]
Apr 01 18:40:58 powervar2 upsd[799]: User upsmaster@127.0.0.1 logged into UPS [heartbeat]
root@powervar2:~# upsdrvctl stop
Network UPS Tools - UPS driver controller 2.8.1
fopen /run/nut/upscode2-UPS.pid: No such file or directory
fopen /run/nut/dummy-ups-heartbeat.pid: No such file or directory

root@powervar2:~# upsdrvctl start
Network UPS Tools - UPS driver controller 2.8.1
Network UPS Tools - UPScode II UPS driver 0.91 (2.8.1)
Warning: This is an experimental driver.
Some features may not function correctly.

Missing UPCL after UPCL
No contact with UPS, delaying init.
Missing UPCL after UPCL
No contact with UPS, delaying init.
Network UPS Tools - Device simulation and repeater driver 0.18 (2.8.1)
jimklimov commented 6 months ago

Hello, sorry to hear about the issue.

Is it possible for you to try running directly the NUT v2.7.4 and newer driver programs with elevated debug verbosity, to see at what point the driver decides it can or can not connect to the device? See https://github.com/networkupstools/nut/wiki/Changing-NUT-daemon-debug-verbosity for details.

Also I wonder if your drivers actually got started as automatically created service instances, hence the attempts with upsdrvctl fail (device is busy) - see https://github.com/networkupstools/nut/wiki/nut%E2%80%90driver%E2%80%90enumerator-(NDE)

rdukes6514 commented 6 months ago

Hi Jim, and thanks.

I am currently running a Pi Zero 2 with RaspiOS Bookworm installed. The available NUT claims to be 2.8.0.

Thanks to your response , I finally figured out where those -D's go...

It looks like Debian automatically creates the services as noted by the following section of the install:

... Setting up nut-server (2.8.0-7) ... Created symlink /etc/systemd/system/nut.target.wants/nut-driver-enumerator.path → /lib/systemd/system/nut-driver-enum erator.path. Created symlink /etc/systemd/system/nut.target.wants/nut-driver-enumerator.service → /lib/systemd/system/nut-driver-e numerator.service. Created symlink /etc/systemd/system/nut.target.wants/nut-driver.target → /lib/systemd/system/nut-driver.target. Created symlink /etc/systemd/system/nut.target.wants/nut-server.service → /lib/systemd/system/nut-server.service. ...

And the fact they turn up in /etc/systemd/system...

... root@newups-l:~# find / -name nut-server* /etc/systemd/system/network-online.target.wants/nut-server.service /etc/systemd/system/nut.target.wants/nut-server.service /etc/systemd/system/multi-user.target.wants/nut-server.service /etc/init.d/nut-server /etc/nut/nut-server.service.apc /usr/lib/systemd/system/nut-server.service /usr/lib/systemd/system/nut-server.service.orig /usr/lib/tmpfiles.d/nut-server.conf /usr/share/doc/nut-server /usr/share/lintian/overrides/nut-server /sys/fs/cgroup/system.slice/nut-server.service /var/lib/systemd/deb-systemd-helper-enabled/nut-server.service.dsh-also /var/lib/systemd/deb-systemd-helper-enabled/nut.target.wants/nut-server.service /var/lib/dpkg/info/nut-server.prerm /var/lib/dpkg/info/nut-server.conffiles /var/lib/dpkg/info/nut-server.preinst /var/lib/dpkg/info/nut-server.list /var/lib/dpkg/info/nut-server.postinst /var/lib/dpkg/info/nut-server.postrm /var/lib/dpkg/info/nut-server.md5sums ...

And here is the Debug info:

... root@newups-l:~# systemctl daemon-reload root@newups-l:~# systemctl restart nut-server root@newups-l:~# systemctl status nut-server ● nut-server.service - Network UPS Tools - power devices information server Loaded: loaded (/lib/systemd/system/nut-server.service; enabled; preset: enabled) Active: active (running) since Thu 2024-04-04 11:40:35 CDT; 7s ago Main PID: 2912 (upsd) Tasks: 1 (limit: 380) CPU: 21ms CGroup: /system.slice/nut-server.service └─2912 /lib/nut/upsd -DDDDDD

Apr 04 11:40:42 newups-l upsd[2912]: 7.251318 [D3] Got PONG from UPS [heartbeat] Apr 04 11:40:42 newups-l upsd[2912]: 7.251384 [D3] sstate_dead: driver for UPS [UPS] says data is stale Apr 04 11:40:42 newups-l upsd[2912]: 7.251439 [D2] mainloop: polling 3 filedescriptors Apr 04 11:40:42 newups-l upsd[2912]: [D2] mainloop: no data available Apr 04 11:40:42 newups-l upsd[2912]: [D3] sstate_dead: driver for UPS [UPS] says data is stale Apr 04 11:40:42 newups-l upsd[2912]: [D3] Pinging UPS [heartbeat] Apr 04 11:40:42 newups-l upsd[2912]: [D2] mainloop: polling 3 filedescriptors Apr 04 11:40:42 newups-l upsd[2912]: [D3] Got PONG from UPS [heartbeat] Apr 04 11:40:42 newups-l upsd[2912]: [D3] sstate_dead: driver for UPS [UPS] says data is stale Apr 04 11:40:42 newups-l upsd[2912]: [D2] mainloop: polling 3 filedescriptors ...

I'm attempting to build from git, but I'm having to learn as I go...

Thanks again, Randall Dukes

rdukes6514 commented 6 months ago

Same story with a fresh install of Bookworm/Nut 2.8.0, followed by building and installing NUT 2.8.2 over it.

jimklimov commented 6 months ago

Is there a unit for nut-driver@UPS.service? What does its log say about the device communications - why it goes stale?

I strongly suspect there's some weird fallout of warnings fixes (there were already a few places that de-facto relied on broken code) but so far lack a smoking gun...

rdukes6514 commented 6 months ago

Apr 09 06:54:19 newups-l upscode2[886]: Missing UPCL after UPCL Apr 09 06:54:19 newups-l upscode2[886]: No contact with UPS, delaying init. Apr 09 06:54:21 newups-l upscode2[886]: Missing UPCL after UPCL Apr 09 06:54:21 newups-l upscode2[886]: No contact with UPS, delaying init. Apr 09 06:54:23 newups-l upscode2[886]: Missing UPCL after UPCL Apr 09 06:54:23 newups-l upscode2[886]: No contact with UPS, delaying init. Apr 09 06:54:25 newups-l upscode2[886]: Missing UPCL after UPCL Apr 09 06:54:25 newups-l upscode2[886]: No contact with UPS, delaying init. Apr 09 06:54:27 newups-l upscode2[886]: Missing UPCL after UPCL Apr 09 06:54:27 newups-l upscode2[886]: No contact with UPS, delaying init. Apr 09 06:54:29 newups-l upscode2[886]: Missing UPCL after UPCL Apr 09 06:54:29 newups-l upscode2[886]: No contact with UPS, delaying init.



Thanks.
jimklimov commented 6 months ago

Thanks, I've updated formatting in your post for "text screenshots" to be more readable.

The /etc/systemd/system/nut-driver.target.wants/nut-driver@UPS.service seems like what NDE (nut-driver-enumerator) generates from the ups.conf sections, so https://github.com/networkupstools/nut/wiki/nut%E2%80%90driver%E2%80%90enumerator-(NDE) wiki page's notes about its potential conflict with a manually-executed driver program or upsdrvctl do apply. In short - if you want to run the driver manually, e.g. to experiment with debug or custom builds where you add more places for it to print debug info, be sure to temporarily stop the systemd unit. If you just want more debugging with the existing program (and in the context it has as a service), temporarily setting debug_min = 6 in its ups.conf section can help: https://github.com/networkupstools/nut/wiki/Changing-NUT-daemon-debug-verbosity

The /sys/fs/cgroup/... is an internal detail of how systemd manages and constrains the services/units, so not much of practical interest. Still, the "can't be found" part may be due to the backslash in \x (should be two separate characters in the path name AFAIK) -- did you single/double-quote the pathname when you looked for it? Try escaping that slash as \\x (otherwise it is interpreted as an escaped "x" character sequence).

Finally the service log is the interesting part and seeing how the driver lifetime started (more so with boosted debug verbosity) would be of interest for troubleshooting. So far I see just the initialization faults, retried every 2 seconds:

Apr 09 06:54:19 newups-l upscode2[886]: Missing UPCL after UPCL
Apr 09 06:54:19 newups-l upscode2[886]: No contact with UPS, delaying init.

You can look at existing unit journal via journalctl -lu nut-driver@UPS.service or watch it scroll when you restart the service with e.g. journalctl -flu nut-driver@UPS.service & systemctl restart nut-driver@UPS.service (both as root).

I wonder if there is some permissions issue for the serial port devfs node (e.g. owned by nut vs. nobody or root or dialout etc. - more so if you rebuilt NUT without specifying same user/group/... settings as applied by packaging earlier), and/or if it has a different pathname after the OS upgrade? As a quick hack for permissions, you can try user=root in the ups.conf section for the device, so the driver does not drop privileges (potentially insecure, so if this is the problem -- better find a way to store/re-apply the needed ownership in devfs across reboots).

rdukes6514 commented 6 months ago

Jim,

Not sure how to do the formatting, but...

A new install of Bookworm on a Pi Zero 2w, was made and NUT (2.8.0/stable) was setup for an APC smart ups 620 using the apcsmart protocol, and I get a working system:

...
root@apc620:~# systemctl status nut-server
● nut-server.service - Network UPS Tools - power devices information server
     Loaded: loaded (/lib/systemd/system/nut-server.service; enabled; preset: e>
     Active: active (running) since Wed 2024-04-10 04:37:19 CDT; 1h 8min ago
   Main PID: 1076 (upsd)
      Tasks: 1 (limit: 380)
        CPU: 401ms
     CGroup: /system.slice/nut-server.service
             └─1076 /lib/nut/upsd -F

Apr 10 04:37:20 apc620 upsd[1076]: Running as foreground process, not saving a >
Apr 10 04:37:20 apc620 nut-server[1076]: Running as foreground process, not sav>
Apr 10 04:37:20 apc620 nut-server[1076]: User upsmaster@127.0.0.1 logged into U>
Apr 10 04:37:20 apc620 upsd[1076]: User upsmaster@127.0.0.1 logged into UPS [UP>
Apr 10 04:37:20 apc620 nut-server[1076]: User upsmaster@127.0.0.1 logged into U>
Apr 10 04:37:20 apc620 upsd[1076]: User upsmaster@127.0.0.1 logged into UPS [he>
Apr 10 04:37:56 apc620 nut-server[1076]: Data for UPS [heartbeat] is stale - ch>
Apr 10 04:37:56 apc620 upsd[1076]: Data for UPS [heartbeat] is stale - check dr>
Apr 10 04:37:57 apc620 nut-server[1076]: UPS [heartbeat] data is no longer stale
Apr 10 04:37:57 apc620 upsd[1076]: UPS [heartbeat] data is no longer stale
...

If I take an identical image and setup for the upscode2 protocol (the only changes being in ups.conf for upscode2, with the added debug_min = 6) for a Powervar PRC1000-a/NS I get: Note results are about the same whether After=network-online.target and Requires=network.target have been uncommented or not (in /etc/systemd/system/nut.target.wants/nut-server.service).

...

● nut-server.service - Network UPS Tools - power devices information server
     Loaded: loaded (/lib/systemd/system/nut-server.service; enabled; preset: enabled)
     Active: active (running) since Wed 2024-04-10 05:38:37 CDT; 25s ago
    Process: 1458 ExecReload=/sbin/upsd -c reload -P $MAINPID (code=exited, status=0/SUCCESS)
   Main PID: 1447 (upsd)
      Tasks: 1 (limit: 380)
        CPU: 55ms
     CGroup: /system.slice/nut-server.service
             └─1447 /lib/nut/upsd -F -DDDDDD

Apr 10 05:39:00 powervar2 upsd[1447]: [D2] write: [destfd=6] [len=24] [VAR UPS ups.status "OL"]
Apr 10 05:39:00 powervar2 upsd[1447]: [D2] mainloop: polling 5 filedescriptors
Apr 10 05:39:00 powervar2 upsd[1447]: [D6] Entering check_command: GET
Apr 10 05:39:00 powervar2 upsd[1447]: [D6] check_command: Calling command handler for GET
Apr 10 05:39:00 powervar2 upsd[1447]: [D2] write: [destfd=8] [len=30] [VAR heartbeat ups.status "OB"]
Apr 10 05:39:00 powervar2 upsd[1447]: [D2] mainloop: polling 5 filedescriptors
Apr 10 05:39:02 powervar2 nut-server[1447]:   25.110534        [D3] Pinging UPS [heartbeat]
Apr 10 05:39:02 powervar2 nut-server[1447]:   25.110590        [D2] mainloop: polling 5 filedescriptors
Apr 10 05:39:02 powervar2 upsd[1447]: [D3] Pinging UPS [heartbeat]
Apr 10 05:39:02 powervar2 upsd[1447]: [D2] mainloop: polling 5 filedescriptors

Apr 10 05:40:21 powervar2 nut-server[1447]:  103.711951        [D2] mainloop: polling 5 filedescriptors
Apr 10 05:40:21 powervar2 upsd[1447]: [D2] mainloop: polling 5 filedescriptors
Apr 10 05:40:21 powervar2 nut-server[1447]:  104.021653        [D3] Pinging UPS [heartbeat]
Apr 10 05:40:21 powervar2 nut-server[1447]:  104.021713        [D2] mainloop: polling 5 filedescriptors
Apr 10 05:40:21 powervar2 nut-server[1447]:  104.021760        [D3] Got PONG from UPS [heartbeat]
Apr 10 05:40:21 powervar2 nut-server[1447]:  104.021802        [D2] mainloop: polling 5 filedescriptors
Apr 10 05:40:21 powervar2 upsd[1447]: [D3] Pinging UPS [heartbeat]
Apr 10 05:40:21 powervar2 upsd[1447]: [D2] mainloop: polling 5 filedescriptors
Apr 10 05:40:21 powervar2 upsd[1447]: [D3] Got PONG from UPS [heartbeat]
Apr 10 05:40:21 powervar2 upsd[1447]: [D2] mainloop: polling 5 filedescriptors
...

This kind of points right back to the upscode2 driver itself

jimklimov commented 6 months ago

Not sure how to do the formatting, but...

Surround by identical lines of 3 or 4 backticks.

As for the log, can you look at nut-driver@UPS.service? The nut-server is the upsd data server which relays info from drivers to networked protocol.

rdukes6514 commented 6 months ago

Jim, Sorry, I had to step away.

After a couple of reboots the upscode2 driver appears to be working "correctly" with NUT 2.8.0. These are the changes made so far:

  1. /usr/lib/systemd/system/nut-server.service a. Uncomment After=network-online.target and Requires=network.target. These appear to be uncommented by the enumerator for apcsmart, but not for upscode2. I think this was required for NUT 2.7.4 as well. b. Change "ExecStart=/sbin/upsd -F" to "ExecStart=/sbin/upsd -F -DDDDDD"

  2. Add "debug_min = 6" to under upscode driver section of /etc/nut/ups.conf

  3. Reboot. systemctl stop nut-server / systemctl daemon-reload / systemctl restart nut-server didn't appear to do the trick.

I have one more ups to update to to setup, so I'll start at square one to check the whole install process.

Thanks, Randall

Edit: Part of the problem was the rs232 level shifter I was using was checking in and out. replaced with a new one and both the Powervar UPSs are working. Both changes to nut-server.service and the change to ups.conf appear to be required to get NUT 2.8.0 upscode2 communicating.

Jim,

I think you mark this as solved.

jimklimov commented 2 months ago

Thanks for the updates. Note that normally you should remove (comment away) the debug_min=6 setting from config files when all began running okay, otherwise your syslog and associated storage would be needlessly stressed.