networkupstools / nut

The Network UPS Tools repository. UPS management protocol Informational RFC 9271 published by IETF at https://www.rfc-editor.org/info/rfc9271 Please star NUT on GitHub, this helps with sponsorships!
https://networkupstools.org/
Other
1.93k stars 346 forks source link

UPSD too eager to report Data stale #661

Open jimklimov opened 5 years ago

jimklimov commented 5 years ago

On our relatively busy NAS server there are occasionally console messages that Data for UPS X is stale. In original default settings, the messages for data is no longer stale do not get to the console (which looked worrysome, to see only lots of failures, until I updated logging settings).

Now that both types of messages are here, I see that stale and not-stale often happen in the same second, so with a bit of patience much of this spam could be avoided :)

To me it seems like some off-by-one math error in patience calculation, or perhaps a hint to induce one to avoid spurious messages on systems a bit too busy to context-switch fast?

Guessing from cases I've seen before in different codebases, maybe there is a rounding error of sec/msec/usec (including granularity of time reports from an OS - more pronounced for filesystem objects which may be off by 1-2 seconds) and comparisons of those numbers assuming infinite (or at least same) precision?

Also sometimes, less often, there are connection losses and re-establishments logged by upsmon some 9-10 seconds apart (probably hitting a 10-second frequency of status evaluation somewhere)... I am not sure how those could be avoided. Perhaps, upsmon could do some double-check (and/or tell upsd to do so) before resorting to a verdict of connection loss.

Example "screenshot" below:

Feb 10 16:42:04 bios-backup upsmon[930]: [ID 702911 daemon.notice] Communications with UPS eaton@localhost established
Feb 10 17:40:53 bios-backup upsmon[930]: [ID 702911 daemon.error] Poll UPS [eaton@localhost] failed - Server disconnected
Feb 10 17:40:53 bios-backup upsmon[930]: [ID 702911 daemon.notice] Communications with UPS eaton@localhost lost
Feb 10 17:41:04 bios-backup upsmon[930]: [ID 702911 daemon.notice] Communications with UPS eaton@localhost established
Feb 10 18:10:41 bios-backup upsd[922]: [ID 702911 daemon.notice] Data for UPS [eaton] is stale - check driver
Feb 10 18:10:42 bios-backup upsd[922]: [ID 702911 daemon.notice] UPS [eaton] data is no longer stale
Feb 10 18:10:44 bios-backup upsmon[930]: [ID 702911 daemon.error] Poll UPS [eaton@localhost] failed - Data stale
Feb 10 18:10:44 bios-backup upsmon[930]: [ID 702911 daemon.notice] Communications with UPS eaton@localhost lost
Feb 10 18:10:58 bios-backup upsmon[930]: [ID 702911 daemon.notice] Communications with UPS eaton@localhost established
Feb 10 18:11:09 bios-backup upsd[922]: [ID 702911 daemon.notice] Data for UPS [eaton] is stale - check driver
Feb 10 18:11:11 bios-backup upsmon[930]: [ID 702911 daemon.error] Poll UPS [eaton@localhost] failed - Data stale
Feb 10 18:11:11 bios-backup upsmon[930]: [ID 702911 daemon.notice] Communications with UPS eaton@localhost lost
Feb 10 18:11:15 bios-backup upsd[922]: [ID 702911 daemon.notice] UPS [eaton] data is no longer stale
Feb 10 18:11:26 bios-backup upsmon[930]: [ID 702911 daemon.notice] Communications with UPS eaton@localhost established
Feb 10 18:12:55 bios-backup upsd[922]: [ID 702911 daemon.notice] Data for UPS [eaton] is stale - check driver
Feb 10 18:12:55 bios-backup upsd[922]: [ID 702911 daemon.notice] UPS [eaton] data is no longer stale
Feb 10 19:54:37 bios-backup upsd[922]: [ID 702911 daemon.notice] Data for UPS [eaton] is stale - check driver
Feb 10 19:54:40 bios-backup upsd[922]: [ID 702911 daemon.notice] UPS [eaton] data is no longer stale
Feb 10 20:24:08 bios-backup upsmon[930]: [ID 702911 daemon.error] Poll UPS [eaton@localhost] failed - Server disconnected
Feb 10 20:24:08 bios-backup upsmon[930]: [ID 702911 daemon.notice] Communications with UPS eaton@localhost lost
Feb 10 20:24:08 bios-backup upsd[922]: [ID 702911 daemon.notice] Data for UPS [eaton] is stale - check driver
Feb 10 20:24:08 bios-backup upsd[922]: [ID 702911 daemon.notice] UPS [eaton] data is no longer stale
Feb 10 20:24:26 bios-backup upsd[922]: [ID 702911 daemon.notice] Data for UPS [eaton] is stale - check driver
Feb 10 20:24:26 bios-backup upsd[922]: [ID 702911 daemon.notice] UPS [eaton] data is no longer stale
Feb 10 20:24:35 bios-backup upsmon[930]: [ID 702911 daemon.error] Set password on [eaton@localhost] failed: Server disconnected
Feb 10 20:24:45 bios-backup upsmon[930]: [ID 702911 daemon.error] Poll UPS [eaton@localhost] failed - Driver not connected
Feb 10 20:24:45 bios-backup upsd[922]: [ID 702911 daemon.notice] Data for UPS [eaton] is stale - check driver
Feb 10 20:24:55 bios-backup upsd[922]: [ID 702911 daemon.notice] UPS [eaton] data is no longer stale
Feb 10 20:25:01 bios-backup upsmon[930]: [ID 702911 daemon.error] Set username on [eaton@localhost] failed: Server disconnected
Feb 10 20:25:01 bios-backup upsd[922]: [ID 702911 daemon.notice] write() failed for ::1: Broken pipe
Feb 10 20:25:08 bios-backup upsmon[930]: [ID 702911 daemon.error] Poll UPS [eaton@localhost] failed - Driver not connected
Feb 10 20:25:18 bios-backup upsd[922]: [ID 702911 daemon.notice] Data for UPS [eaton] is stale - check driver
Feb 10 20:25:18 bios-backup upsmon[930]: [ID 702911 daemon.error] Poll UPS [eaton@localhost] failed - Data stale
Feb 10 20:25:31 bios-backup upsmon[930]: [ID 702911 daemon.error] Poll UPS [eaton@localhost] failed - Server disconnected
Feb 10 20:25:42 bios-backup upsd[922]: [ID 702911 daemon.notice] UPS [eaton] data is no longer stale
Feb 10 20:25:42 bios-backup upsmon[930]: [ID 702911 daemon.notice] Communications with UPS eaton@localhost established
Feb 10 20:25:59 bios-backup upsmon[930]: [ID 702911 daemon.error] Poll UPS [eaton@localhost] failed - Server disconnected
Feb 10 20:25:59 bios-backup upsmon[930]: [ID 702911 daemon.notice] Communications with UPS eaton@localhost lost
Feb 10 20:26:00 bios-backup upsd[922]: [ID 702911 daemon.notice] Data for UPS [eaton] is stale - check driver
Feb 10 20:26:11 bios-backup upsd[922]: [ID 702911 daemon.notice] UPS [eaton] data is no longer stale
Feb 10 20:26:22 bios-backup upsmon[930]: [ID 702911 daemon.error] Set username on [eaton@localhost] failed: Server disconnected
Feb 10 20:26:22 bios-backup upsd[922]: [ID 702911 daemon.notice] write() failed for ::1: Broken pipe
Feb 10 20:26:29 bios-backup upsd[922]: [ID 702911 daemon.notice] Data for UPS [eaton] is stale - check driver
Feb 10 20:26:29 bios-backup upsd[922]: [ID 702911 daemon.notice] UPS [eaton] data is no longer stale
Feb 10 20:26:34 bios-backup upsmon[930]: [ID 702911 daemon.error] Poll UPS [eaton@localhost] failed - Driver not connected
Feb 10 20:26:45 bios-backup upsd[922]: [ID 702911 daemon.notice] Data for UPS [eaton] is stale - check driver
Feb 10 20:26:45 bios-backup upsmon[930]: [ID 702911 daemon.error] Poll UPS [eaton@localhost] failed - Data stale
Feb 10 20:26:57 bios-backup upsmon[930]: [ID 702911 daemon.error] Poll UPS [eaton@localhost] failed - Server disconnected
Feb 10 20:27:08 bios-backup upsd[922]: [ID 702911 daemon.notice] UPS [eaton] data is no longer stale
Feb 10 20:27:08 bios-backup upsmon[930]: [ID 702911 daemon.notice] Communications with UPS eaton@localhost established
Feb 10 20:28:19 bios-backup upsd[922]: [ID 702911 daemon.notice] Data for UPS [eaton] is stale - check driver
Feb 10 20:28:20 bios-backup upsmon[930]: [ID 702911 daemon.error] Poll UPS [eaton@localhost] failed - Data stale
Feb 10 20:28:20 bios-backup upsmon[930]: [ID 702911 daemon.notice] Communications with UPS eaton@localhost lost
Feb 10 20:28:25 bios-backup upsd[922]: [ID 702911 daemon.notice] UPS [eaton] data is no longer stale
Feb 10 20:28:30 bios-backup upsmon[930]: [ID 702911 daemon.notice] Communications with UPS eaton@localhost established
Feb 10 20:39:37 bios-backup upsd[922]: [ID 702911 daemon.notice] Data for UPS [eaton] is stale - check driver
Feb 10 20:39:37 bios-backup upsd[922]: [ID 702911 daemon.notice] UPS [eaton] data is no longer stale
Feb 10 21:10:18 bios-backup upsmon[930]: [ID 702911 daemon.error] Poll UPS [eaton@localhost] failed - Server disconnected
Feb 10 21:10:18 bios-backup upsmon[930]: [ID 702911 daemon.notice] Communications with UPS eaton@localhost lost
Feb 10 21:10:29 bios-backup upsmon[930]: [ID 702911 daemon.notice] Communications with UPS eaton@localhost established
Feb 10 23:25:35 bios-backup upsd[922]: [ID 702911 daemon.notice] Data for UPS [eaton] is stale - check driver
Feb 10 23:25:35 bios-backup upsd[922]: [ID 702911 daemon.notice] UPS [eaton] data is no longer stale
Feb 10 23:25:36 bios-backup upsmon[930]: [ID 702911 daemon.error] Poll UPS [eaton@localhost] failed - Data stale
Feb 10 23:25:36 bios-backup upsmon[930]: [ID 702911 daemon.notice] Communications with UPS eaton@localhost lost
Feb 10 23:25:45 bios-backup upsmon[930]: [ID 702911 daemon.notice] Communications with UPS eaton@localhost established
Feb 11 01:10:36 bios-backup upsd[922]: [ID 702911 daemon.notice] Data for UPS [eaton] is stale - check driver
Feb 11 01:10:38 bios-backup upsd[922]: [ID 702911 daemon.notice] UPS [eaton] data is no longer stale
Feb 11 01:10:40 bios-backup upsmon[930]: [ID 702911 daemon.error] Poll UPS [eaton@localhost] failed - Data stale
Feb 11 01:10:40 bios-backup upsmon[930]: [ID 702911 daemon.notice] Communications with UPS eaton@localhost lost
Feb 11 01:10:50 bios-backup upsmon[930]: [ID 702911 daemon.notice] Communications with UPS eaton@localhost established
Feb 11 01:11:04 bios-backup upsd[922]: [ID 702911 daemon.notice] Data for UPS [eaton] is stale - check driver
Feb 11 01:11:04 bios-backup upsd[922]: [ID 702911 daemon.notice] UPS [eaton] data is no longer stale
Feb 11 02:25:37 bios-backup upsd[922]: [ID 702911 daemon.notice] Data for UPS [eaton] is stale - check driver
Feb 11 02:25:37 bios-backup upsd[922]: [ID 702911 daemon.notice] UPS [eaton] data is no longer stale
Feb 11 02:55:28 bios-backup upsd[922]: [ID 702911 daemon.notice] Data for UPS [eaton] is stale - check driver
Feb 11 02:55:32 bios-backup upsd[922]: [ID 702911 daemon.notice] UPS [eaton] data is no longer stale
Feb 11 02:55:34 bios-backup upsmon[930]: [ID 702911 daemon.error] Poll UPS [eaton@localhost] failed - Data stale
Feb 11 02:55:34 bios-backup upsmon[930]: [ID 702911 daemon.notice] Communications with UPS eaton@localhost lost
Feb 11 02:55:44 bios-backup upsmon[930]: [ID 702911 daemon.notice] Communications with UPS eaton@localhost established
Feb 11 03:47:40 bios-backup upsd[922]: [ID 702911 daemon.notice] Data for UPS [eaton] is stale - check driver
Feb 11 03:47:40 bios-backup upsd[922]: [ID 702911 daemon.notice] UPS [eaton] data is no longer stale
Feb 11 04:46:58 bios-backup upsd[922]: [ID 702911 daemon.notice] Data for UPS [eaton] is stale - check driver
Feb 11 04:46:58 bios-backup upsd[922]: [ID 702911 daemon.notice] UPS [eaton] data is no longer stale
Feb 11 06:35:46 bios-backup upsd[922]: [ID 702911 daemon.notice] Data for UPS [eaton] is stale - check driver
Feb 11 06:35:46 bios-backup upsd[922]: [ID 702911 daemon.notice] UPS [eaton] data is no longer stale
Feb 11 07:18:59 bios-backup upsd[922]: [ID 702911 daemon.notice] Data for UPS [eaton] is stale - check driver
Feb 11 07:18:59 bios-backup upsd[922]: [ID 702911 daemon.notice] UPS [eaton] data is no longer stale
aquette commented 5 years ago

privet @jimklimov I'd like to make a setup to reproduce this, since the fix may not be that trivial. Questions:

admin-kwsoft commented 5 years ago

We are plagued by the same problem. In our case we would like to monitor a Eaton 93PS and tried netxml-ups and snmp-ups and get the stale/no longer stale cycle with both. I would be able to provide tcpdump data and we are able to test/compile patches on that machine if someone is interested in solving the problem.

admin-kwsoft commented 5 years ago

Might this be solved with https://github.com/networkupstools/nut/issues/542 ??

jimklimov commented 5 years ago

@aquette : for the device in that issue, it was a generic usbhid-ups:

[eaton]
    driver = usbhid-ups
    synchronous = yes
    port = /dev/usb/463.ffff/0
    desc = "EATON Protection Station AN2E49008"
#    pollonly = true

The MAXAGE is not customized, so should be 15 sec per comments in the config.

evoisard commented 5 years ago

I had the same issue here with the snmp-ups driver and a HP R/T3000 UPS with a network module.

The polling rate is 15s and I've tried to change MAXAGE up to 60s to no effect. The weird thing is that "is stale" and then "no longer stale" were occurring within the same second, which I couldn't explain, except by some race condition or something in upsd.

But I then sniffed the network interface of my NUT host for SNMP traffic. There was indeed no timeout issue, all the requests/replies occurring in time. But then I noticed that during its series of GETs, sometimes (but no always!) the driver was requesting an additional but non-existent (incomplete) SNMP OID. In that case the UPS was replying with a noSuchName error. There were as many is-stale/not-stale as there were GETs for such non-existent SNMP variables. See below:

tail -f /var/log/messages

Mar 22 10:38:42 myhost upsd: Data for UPS [rt3000] is stale - check driver
Mar 22 10:38:42 myhost upsd[22486]: UPS [rt3000] data is no longer stale
Mar 22 10:38:42 myhost upsd: UPS [rt3000] data is no longer stale
Mar 22 10:38:43 myhost upsd[22486]: Data for UPS [rt3000] is stale - check driver
Mar 22 10:38:43 myhost upsd: Data for UPS [rt3000] is stale - check driver
Mar 22 10:38:43 myhost upsd[22486]: UPS [rt3000] data is no longer stale
Mar 22 10:38:43 myhost upsd: UPS [rt3000] data is no longer stale
Mar 22 10:38:43 myhost upsd[22486]: Data for UPS [rt3000] is stale - check driver
Mar 22 10:38:43 myhost upsd: Data for UPS [rt3000] is stale - check driver
Mar 22 10:38:43 myhost upsd[22486]: UPS [rt3000] data is no longer stale
Mar 22 10:38:43 myhost upsd: UPS [rt3000] data is no longer stale
Mar 22 10:38:43 myhost upsd[22486]: Data for UPS [rt3000] is stale - check driver
Mar 22 10:38:43 myhost upsd: Data for UPS [rt3000] is stale - check driver
Mar 22 10:38:43 myhost upsd[22486]: UPS [rt3000] data is no longer stale
Mar 22 10:38:43 myhost upsd: UPS [rt3000] data is no longer stale

tcpdump -i ens192 'udp port 161'

10:38:42.952190 IP myhost.51497 > ups.snmp:  GetRequest(33)  E:232.165.3.4.1.0
10:38:42.959678 IP ups.snmp > myhost.51497:  GetResponse(34)  E:232.165.3.4.1.0=10
10:38:42.959763 IP myhost.51497 > ups.snmp:  GetRequest(34)  E:232.165.3.4.4.1.4
10:38:42.967535 IP ups.snmp > myhost.51497:  GetResponse(34)  noSuchName@1 E:232.165.3.4.4.1.4=
10:38:42.967613 IP myhost.51497 > ups.snmp:  GetRequest(33)  E:232.165.3.9.3.0
10:38:42.975343 IP ups.snmp > myhost.51497:  GetResponse(35)  E:232.165.3.9.3.0=3000
10:38:42.975449 IP myhost.51497 > ups.snmp:  GetRequest(33)  E:232.165.3.4.5.0
10:38:42.982738 IP ups.snmp > myhost.51497:  GetResponse(34)  E:232.165.3.4.5.0=3
10:38:42.982809 IP myhost.51497 > ups.snmp:  GetRequest(33)  E:232.165.3.2.5.0
10:38:42.989580 IP ups.snmp > myhost.51497:  GetResponse(34)  E:232.165.3.2.5.0=4
10:38:42.989646 IP myhost.51497 > ups.snmp:  GetRequest(33)  E:232.165.3.4.5.0
10:38:42.996967 IP ups.snmp > myhost.51497:  GetResponse(34)  E:232.165.3.4.5.0=3
10:38:42.997034 IP myhost.51497 > ups.snmp:  GetRequest(33)  E:232.165.3.7.2.0
10:38:43.004655 IP ups.snmp > myhost.51497:  GetResponse(34)  E:232.165.3.7.2.0=2
10:38:43.004725 IP myhost.51497 > ups.snmp:  GetRequest(33)  E:232.165.3.8.1.0
10:38:43.011727 IP ups.snmp > myhost.51497:  GetResponse(34)  E:232.165.3.8.1.0=0
10:38:43.011780 IP myhost.51497 > ups.snmp:  GetRequest(33)  E:232.165.3.8.2.0
10:38:43.019192 IP ups.snmp > myhost.51497:  GetResponse(34)  E:232.165.3.8.2.0=0
10:38:43.019262 IP myhost.51497 > ups.snmp:  GetRequest(33)  E:232.165.3.6.1.0
10:38:43.026700 IP ups.snmp > myhost.51497:  GetResponse(34)  E:232.165.3.6.1.0=30
10:38:43.026771 IP myhost.51497 > ups.snmp:  GetRequest(33)  E:232.165.3.6.2.0
10:38:43.034295 IP ups.snmp > myhost.51497:  GetResponse(34)  E:232.165.3.6.2.0=0
10:38:43.034364 IP myhost.51497 > ups.snmp:  GetRequest(33)  E:232.165.3.6.3.0
10:38:43.041262 IP ups.snmp > myhost.51497:  GetResponse(34)  E:232.165.3.6.3.0=40
10:38:43.041335 IP myhost.51497 > ups.snmp:  GetRequest(33)  E:232.165.3.2.4.0
10:38:43.048508 IP ups.snmp > myhost.51497:  GetResponse(34)  E:232.165.3.2.4.0=99
10:38:43.048574 IP myhost.51497 > ups.snmp:  GetRequest(33)  E:232.165.3.2.1.0
10:38:43.055751 IP ups.snmp > myhost.51497:  GetResponse(35)  E:232.165.3.2.1.0=4221
10:38:43.055837 IP myhost.51497 > ups.snmp:  GetRequest(33)  E:232.165.3.2.2.0
10:38:43.063191 IP ups.snmp > myhost.51497:  GetResponse(35)  E:232.165.3.2.2.0=130
10:38:43.063258 IP myhost.51497 > ups.snmp:  GetRequest(33)  E:232.165.3.2.3.0
10:38:43.069987 IP ups.snmp > myhost.51497:  GetResponse(34)  E:232.165.3.2.3.0=0
10:38:43.070058 IP myhost.51497 > ups.snmp:  GetRequest(33)  E:232.165.3.3.3.0
10:38:43.077273 IP ups.snmp > myhost.51497:  GetResponse(34)  E:232.165.3.3.3.0=1
10:38:43.077342 IP myhost.51497 > ups.snmp:  GetRequest(33)  E:232.165.3.3.1.0
10:38:43.084524 IP ups.snmp > myhost.51497:  GetResponse(35)  E:232.165.3.3.1.0=498
10:38:43.084608 IP myhost.51497 > ups.snmp:  GetRequest(34)  E:232.165.3.3.4.1.2
10:38:43.091429 IP ups.snmp > myhost.51497:  GetResponse(34)  noSuchName@1 E:232.165.3.3.4.1.2=
10:38:43.091516 IP myhost.51497 > ups.snmp:  GetRequest(35)  E:232.165.3.3.4.1.2.1
10:38:43.099143 IP ups.snmp > myhost.51497:  GetResponse(37)  E:232.165.3.3.4.1.2.1=230
10:38:43.099243 IP myhost.51497 > ups.snmp:  GetRequest(33)  E:232.165.3.9.2.0
10:38:43.106727 IP ups.snmp > myhost.51497:  GetResponse(35)  E:232.165.3.9.2.0=230
10:38:43.106795 IP myhost.51497 > ups.snmp:  GetRequest(34)  E:232.165.3.3.4.1.3
10:38:43.114091 IP ups.snmp > myhost.51497:  GetResponse(34)  noSuchName@1 E:232.165.3.3.4.1.3=
10:38:43.114166 IP myhost.51497 > ups.snmp:  GetRequest(35)  E:232.165.3.3.4.1.3.1
10:38:43.121087 IP ups.snmp > myhost.51497:  GetResponse(36)  E:232.165.3.3.4.1.3.1=3
10:38:43.121170 IP myhost.51497 > ups.snmp:  GetRequest(34)  E:232.165.3.3.4.1.4
10:38:43.128752 IP ups.snmp > myhost.51497:  GetResponse(34)  noSuchName@1 E:232.165.3.3.4.1.4=
10:38:43.128828 IP myhost.51497 > ups.snmp:  GetRequest(33)  E:232.165.3.3.2.0
10:38:43.136010 IP ups.snmp > myhost.51497:  GetResponse(34)  E:232.165.3.3.2.0=1
10:38:43.136092 IP myhost.51497 > ups.snmp:  GetRequest(33)  E:232.165.3.4.3.0
10:38:43.143319 IP ups.snmp > myhost.51497:  GetResponse(34)  E:232.165.3.4.3.0=1
10:38:43.143392 IP myhost.51497 > ups.snmp:  GetRequest(33)  E:232.165.3.4.2.0
10:38:43.150226 IP ups.snmp > myhost.51497:  GetResponse(35)  E:232.165.3.4.2.0=498
10:38:43.150308 IP myhost.51497 > ups.snmp:  GetRequest(33)  E:232.165.3.9.4.0
10:38:43.158029 IP ups.snmp > myhost.51497:  GetResponse(35)  E:232.165.3.9.4.0=500
10:38:43.158107 IP myhost.51497 > ups.snmp:  GetRequest(35)  E:232.165.3.4.4.1.2.1
10:38:43.165591 IP ups.snmp > myhost.51497:  GetResponse(37)  E:232.165.3.4.4.1.2.1=230
10:38:43.165671 IP myhost.51497 > ups.snmp:  GetRequest(33)  E:232.165.3.9.1.0
10:38:43.173157 IP ups.snmp > myhost.51497:  GetResponse(35)  E:232.165.3.9.1.0=230
10:38:43.173233 IP myhost.51497 > ups.snmp:  GetRequest(35)  E:232.165.3.4.4.1.3.1
10:38:43.180234 IP ups.snmp > myhost.51497:  GetResponse(36)  E:232.165.3.4.4.1.3.1=1
10:38:43.180312 IP myhost.51497 > ups.snmp:  GetRequest(33)  E:232.165.3.10.1.0
10:38:43.188018 IP ups.snmp > myhost.51497:  GetResponse(34)  E:232.165.3.10.1.0=2
10:38:43.188100 IP myhost.51497 > ups.snmp:  GetRequest(35)  E:232.165.3.10.2.1.2.1
10:38:43.195808 IP ups.snmp > myhost.51497:  GetResponse(36)  E:232.165.3.10.2.1.2.1=1
10:38:43.195887 IP myhost.51497 > ups.snmp:  GetRequest(35)  E:232.165.3.10.2.1.2.2matter of los
10:38:43.203640 IP ups.snmp > myhost.51497:  GetResponse(36)  E:232.165.3.10.2.1.2.2=1

In cases where stale states were occurring during several consecutive polls (>DEADTIME), it was triggering false positive NOCOMM notifications which I was receiving by email. Very annoying...

My snmp-ups driver was configured for HP's specific SNMP MIB (cpqpower) but then I tried with standard IETF's MIB (which my UPS supports): the problem disappeared!...

So, in my case at least, is-stale/not-stale was not a matter of network connection, nor an update time issue. There should be something buggy in the snmp-ups driver that makes it sometimes (!) requesting wrong OIDs (at least with cpqpower MIB but maybe with other MIBs too), which then cause SNMP errors, which are reported as errors to upsd, which considers the UPS faulty (which indeed is incorrect).

I agree that UPSD is too eager to report Data stale...