Open VoiletDragon opened 1 year ago
Just in case: are your experiments using NUT as packaged by this or that OS release (so probably variants of old NUT v2.7.4), or a more recent one (2.8.0 or custom build of master branch)?
Worth checking if the problem is not with RPi itself (over past couple of years we had many reports about issues with them, search by the issue tag for more such questions, but maybe that just reflects their popularity and not flaws of the platform). E.g. can you query the system (or see in dmesg) if it power-saved the USB port, etc. over time?
However given that WebUI (and I suppose CLI upsc
) report the status correctly, the problem is not with HW and NUT drivers, but rather about communications between the data server upsd
and the client upsmon
. Maybe some firewall gets restarted nightly and breaks the TCP connection, or some tmp-file cleanup removes stuff (should not matter though... at worst a PID file?)
With NUT v2.8.0 or newer you can use debug_min
setting in daemon config files to bump their verbosity, and when they get into the state of not updating the ups.status
for upsmon
you would have more troubleshooting hints (e.g. lost connection between the two - though that should also emit logged warnings I think). For 2.7.4 you would have to modify init/systemd scripts to include -D
arguments, and possibly adapt the systemd unit Type
to a non-forking behavior while the option is used.
As for some seconds of delay, there was a discussion just recently that this can be throttled by hardware (UPS controller is not eager or quick enough to report issues ASAP, especially if hoping that the power blip is irrelevantly short), beside the NUT driver polling frequency (if not using interrupts) and "upsmon polling of upsd" frequency. Thinking of it, can't say OTOH (gotta look in code) whether upsd has a way to quickly notify clients like upsmon as soon as it knows of a relevant event.
Just in case: are your experiments using NUT as packaged by this or that OS release (so probably variants of old NUT v2.7.4), or a more recent one (2.8.0 or custom build of master branch)?
Worth checking if the problem is not with RPi itself (over past couple of years we had many reports about issues with them, search by the issue tag for more such questions, but maybe that just reflects their popularity and not flaws of the platform). E.g. can you query the system (or see in dmesg) if it power-saved the USB port, etc. over time?
However given that WebUI (and I suppose CLI
upsc
) report the status correctly, the problem is not with HW and NUT drivers, but rather about communications between the data serverupsd
and the clientupsmon
. Maybe some firewall gets restarted nightly and breaks the TCP connection, or some tmp-file cleanup removes stuff (should not matter though... at worst a PID file?) With NUT v2.8.0 or newer you can usedebug_min
setting in daemon config files to bump their verbosity, and when they get into the state of not updating theups.status
forupsmon
you would have more troubleshooting hints (e.g. lost connection between the two - though that should also emit logged warnings I think). For 2.7.4 you would have to modify init/systemd scripts to include-D
arguments, and possibly adapt the systemd unitType
to a non-forking behavior while the option is used.As for some seconds of delay, there was a discussion just recently that this can be throttled by hardware (UPS controller is not eager or quick enough to report issues ASAP, especially if hoping that the power blip is irrelevantly short), beside the NUT driver polling frequency (if not using interrupts) and "upsmon polling of upsd" frequency. Thinking of it, can't say OTOH (gotta look in code) whether upsd has a way to quickly notify clients like upsmon as soon as it knows of a relevant event.
Hi,
Thanks for your reply, Nut version installed is 2.7.4 from the Debian Repository, Should i try and build from source then?
Here is the configuration
GNU nano 5.4 ups.conf
pollinterval = 1
maxretry = 3
[vdn-ups01]
driver = usbhid-ups
port = auto
desc = "VDN-UPS01 APC Smart-UPS 1000VA RM"
vendorid = 051d
productid = 0002
serial = AS0613
[vdn-ups02]
driver = usbhid-ups
port = auto
desc = "VDN-UPS02 APC Smart-UPS 1000VA RM"
vendorid = 051d
productid = 0002
serial = AS0613
MONITOR vdn-ups01@localhost 1 admin se master
MONITOR vdn-ups02@localhost 1 admin se master
MINSUPPLIES 1
POLLFREQ 5
POLLFREQALERT 2
NOTIFYCMD "/etc/nut/notifycmd.sh"
# Notification events
NOTIFYFLAG ONLINE SYSLOG+WALL+EXEC
NOTIFYFLAG ONBATT SYSLOG+WALL+EXEC
NOTIFYFLAG LOWBATT SYSLOG+WALL+EXEC
NOTIFYFLAG FSD SYSLOG+WALL+EXEC
NOTIFYFLAG COMMOK SYSLOG+WALL+EXEC
NOTIFYFLAG COMMBAD SYSLOG+WALL+EXEC
NOTIFYFLAG SHUTDOWN SYSLOG+WALL+EXEC
NOTIFYFLAG REPLBATT SYSLOG+WALL+EXEC
NOTIFYFLAG NOCOMM SYSLOG+WALL+EXEC
NOTIFYFLAG NOPARENT SYSLOG+WALL+EXEC
Regards
Wondering if serial numbers on the two devices are really identical (AS0613
) or you just obfuscated the config?
If they are indeed the same (happens to several vendors) then a master-branch build can help in that matching by additional bus
and/or device
entries should be functional (had bugs fixed after 2.8.0 release). It can also help generally as a newer and much revised codebase, with no build warnings (so some might be hidden errors) and friendlier for debugging, troubleshooting and maintenance, and with proper support for libusb-1.0.
In particular for APC devices there was a debacle with their encoding of some values in USB protocol (#1024 et al for context) that was addressed in 2.8.0 and later too. Curious if that is relevant here...
Wondering if serial numbers on the two devices are really identical (
AS0613
) or you just obfuscated the config?If they are indeed the same (happens to several vendors) then a master-branch build can help in that matching by additional
bus
and/ordevice
entries should be functional (had bugs fixed after 2.8.0 release). It can also help generally as a newer and much revised codebase, with no build warnings (so some might be hidden errors) and friendlier for debugging, troubleshooting and maintenance, and with proper support for libusb-1.0.In particular for APC devices there was a debacle with their encoding of some values in USB protocol (#1024 et al for context) that was addressed in 2.8.0 and later too. Curious if that is relevant here...
Just rebuilt the Pi, using from Source it's worst now, takes 30 seconds to detect the UPS is on battery. I am out of ideas with this, tried everything and can't seem to fix the problem.
Update, just tested SNMP on a 9617 Management Card same problem, On Battery is delayed after a re-build it's taking around 30 seconds. This is not good 20% of Battery has drained by this time, When AC is restored it doesn't even detect that for about 30 seconds. I am out of ideas on this.
Update, Just tried on a Desktop Computer and it is the same, takes around 30 seconds to detect that it's on battery / AC
Puzzled here... with the SNMP card however, you likely have a web interface? Does the card report change of status more quickly? If you manually snmpwalk
the OID for UPS status, does it reflect online/onbatt instantly or also with a long delay (e.g. is the problem in UPS or in NUT)?
Still, 30 seconds sounds much like default polling frequency. You can make that tick faster - at a stress to both monitoring station and to the NMC (ones I had 20 years ago tended to stall and reboot whenever anything nontrivial hit them).
Puzzled here... with the SNMP card however, you likely have a web interface? Does the card report change of status more quickly? If you manually
snmpwalk
the OID for UPS status, does it reflect online/onbatt instantly or also with a long delay (e.g. is the problem in UPS or in NUT)?Still, 30 seconds sounds much like default polling frequency. You can make that tick faster - at a stress to both monitoring station and to the NMC (ones I had 20 years ago tended to stall and reboot whenever anything nontrivial hit them).
The APC9617 update is instant on the Web Gui but no on NUT itself, How would i change the default polling frequency over USB or SNMP ?
Regards
"When nothing helps, RTFM!" heh? ;)
pollfreq
should be it for most drivers, between "full updates":
There is also pollinterval
for "quick updates" (I think ups.status
should be in that selection).
Try running the drivers with higher debug level for a while (simplest by specifying debug_min
in ups.conf
if using a 2.8.0+ codebase) to see how often they really update stuff and when they see the status change.
On client (upsmon) side, there's also a POLLFREQ
of its own - see https://github.com/networkupstools/nut/blob/master/docs/man/upsmon.conf.txt#L265
Ive noticed on both of my UPS's when running upsc that driver.parameter.pollfreq is set to 30 yet in upsmon.conf pollfreq is set to 5 and pollfreqalert is set to 2 is this normal? it's either that or the settings i am change is not sticking, i've tried changing pollfreq and pollfreqalert in upsmon.conf but not making any difference.
Regards
You can see here the delay, this is from console over SSH. Broadcast message from root@vdn-ups-mgmt (somewhere) (Fri Mar 31 22:55:44 2023)
UPS ups01@localhost on battery
Broadcast message from root@vdn-ups-mgmt (somewhere) (Fri Mar 31 22:56:04 2023)
UPS ups01@localhost on line power
Broadcast message from root@vdn-ups-mgmt (somewhere) (Fri Mar 31 22:56:34 2023)
UPS ups01@localhost on battery
Broadcast message from root@vdn-ups-mgmt (somewhere) (Fri Mar 31 22:56:54 2023)
UPS ups01@localhost on line power
Drivers are configured in ups.conf; client in upsmon.conf, server in upsd.conf, etc.
Drivers are configured in ups.conf; client in upsmon.conf, server in upsd.conf, etc.
Okay, so how come isn't driver.parameter.pollfreq is set to 30 changing when I set it to 5?
i've tried changing pollfreq and pollfreqalert in upsmon.conf but not making any difference.
Sorry, maybe it's too late in the night here but I don't see a statement that you've tried setting pollfreq in ups.conf (section for the device) specifically. And it goes without saying, but just in case - you would need to restart the driver ;)
Also, did you have a chance to test with snmpwalk
how soon after the physical state change does the SNMP card reflect it - is there any substantial delay in NMC itself?
Thinking of it, upsmon wall
broadcasts may be throttled (at least the data-stale warnings, and maybe on-battery ones, are by default repeated only every 300 sec = 5 min). State changes would probably be reported on the next loop though.
i've tried changing pollfreq and pollfreqalert in upsmon.conf but not making any difference.
Sorry, maybe it's too late in the night here but I don't see a statement that you've tried setting pollfreq in ups.conf (section for the device) specifically. And it goes without saying, but just in case - you would need to restart the driver ;)
Also, did you have a chance to test with
snmpwalk
how soon after the physical state change does the SNMP card reflect it - is there any substantial delay in NMC itself?Thinking of it, upsmon
wall
broadcasts may be throttled (at least the data-stale warnings, and maybe on-battery ones, are by default repeated only every 300 sec = 5 min). State changes would probably be reported on the next loop though.
i've not tested since with SNMP, I switched to USB, not sure if this is correct or not but i've added it to ups.conf
pollinterval = 1 maxretry = 3 pollfreq = 5
[ups01] driver = usbhid-ups port = auto desc = "VDN-UPS01 APC Smart-UPS 1000VA RM" vendorid = 051d productid = 0002 serial = "AS06132
i've tried changing pollfreq and pollfreqalert in upsmon.conf but not making any difference.
Sorry, maybe it's too late in the night here but I don't see a statement that you've tried setting pollfreq in ups.conf (section for the device) specifically. And it goes without saying, but just in case - you would need to restart the driver ;)
Also, did you have a chance to test with
snmpwalk
how soon after the physical state change does the SNMP card reflect it - is there any substantial delay in NMC itself?Thinking of it, upsmon
wall
broadcasts may be throttled (at least the data-stale warnings, and maybe on-battery ones, are by default repeated only every 300 sec = 5 min). State changes would probably be reported on the next loop though.
Update,
I managed to figure it out, I had to add it such as,
pollinterval = 1 maxretry = 3
[ups01] driver = usbhid-ups pollfreq = 5 port = auto desc = "VDN-UPS01 APC Smart-UPS 1000VA RM" vendorid = 051d productid = 0002 serial = "AS0613"
Now upsc is showing, upsc ups01@localhost Init SSL without certificate database battery.charge: 100 battery.charge.low: 10 battery.charge.warning: 50 battery.mfr.date: 2021/03/21 battery.runtime: 1260 battery.runtime.low: 120 battery.temperature: 24.3 battery.type: PbAc battery.voltage: 27.7 battery.voltage.nominal: 24.0 device.mfr: American Power Conversion device.model: Smart-UPS 1000 RM device.serial: AS0613230404 device.type: ups driver.name: usbhid-ups driver.parameter.pollfreq: 5 driver.parameter.pollinterval: 1
Time to try this again.
Regards
I think I am somewhat making progress, Switch to battery now takes around 4 seconds but switching to AC Power takes around 7 seconds, which i think is a improvement but just need to get the AC Power at the same time, any suggestions?
Same as before: boost debug, see in real-time what happens on the driver side. Cross-check with other tools/media where possible.
Same as before: boost debug, see in real-time what happens on the driver side. Cross-check with other tools/media where possible.
How do you enable Boost Debug? It is defiantly better that's for sure
Please do read the replies. It is tedious and time-wasting to repeat same words every few posts :\
With NUT v2.8.0 or newer you can use debug_min setting in daemon config files to bump their verbosity, and when they get into the state of not updating the ups.status for upsmon you would have more troubleshooting hints (e.g. lost connection between the two - though that should also emit logged warnings I think). For 2.7.4 you would have to modify init/systemd scripts to include -D arguments, and possibly adapt the systemd unit Type to a non-forking behavior while the option is used.
Daemons in this context are drivers, upsd and ups.conf - all with manageable verbosity.
Please do read the replies. It is tedious and time-wasting to repeat same words every few posts :\
With NUT v2.8.0 or newer you can use debug_min setting in daemon config files to bump their verbosity, and when they get into the state of not updating the ups.status for upsmon you would have more troubleshooting hints (e.g. lost connection between the two - though that should also emit logged warnings I think). For 2.7.4 you would have to modify init/systemd scripts to include -D arguments, and possibly adapt the systemd unit Type to a non-forking behavior while the option is used.
Daemons in this context are drivers, upsd and ups.conf - all with manageable verbosity.
Sorry about that, It's late here but i am determined to get this working.
I think I have found the sweet spot you know. Setting both driver.parameter.pollfreq:5 and driver.parameter.pollinterval: 1 is a sweet spot. Just going to test it a few times before i apply it to the 6 Servers.
Regards
Well after many hours it's 3am here but I think I have cracked it, Setting both driver.parameter.pollfreq:5 and driver.parameter.pollinterval has solved it, just need to test it a few more times before i commit to deploying it on Prod. Right now I'm calling it a good night.
Thanks for your time and help Jimklimov. I shall report back in the next few days.
Cheers
Regards.
Feel free to word a chapter to the FAQ if there is not a relevant helpful one there, and post a PR with your findings.
Hi folks,
I've setup NUT Server on a RPI4 and connected two APC 1000 units via USB to monitor and to automate Shutdown on a few Servers I run in my DC however I have a problem, at first it works pull Power to either UPSs and console updates with UPS On Battery but issue is its slow on recognising that the UPS switches to battery its a bit hit and miss sometimes it takes 5 seconds or 20 seconds just seems very delayed. Now another problem, after NUT has been running a few days it just stops working, console never sees that either UPS is on battery but the strange thing is that the Web Gui updates, if I disconnect either UPS it detects that on both server and client side never detects that the UPS is on battery after a few days of being left on.
Here's what I've tried, different Raspberry Pi, different USB cables, different distros, tried switching from Ubuntu 22.04 to 18.04 also tried Debian. I've tried pollonly flag as well as polintervals but same problem.
I would like to get this working as I need to automate safe shutdowns on my Infrastructure, I think NUT is good it's just needing it to be reliable.
If anyone has any ideas then let me know as I am basically banging my head on the desk trying to figure it out.
Regards