networkupstools / nut

The Network UPS Tools repository. UPS management protocol Informational RFC 9271 published by IETF at https://www.rfc-editor.org/info/rfc9271 Please star NUT on GitHub, this helps with sponsorships!
https://networkupstools.org/
Other
1.93k stars 346 forks source link

No data after ~1 day #1960

Open hajkenek opened 1 year ago

hajkenek commented 1 year ago

Hi, i have strange problem. After about 1 day of running nut-upsd in docker there is no new data from ups. No logs - only:

2023-06-04T14:39:03.582115162Z Network UPS Tools - UPS driver controller 2.8.0
2023-06-04T14:39:03.583145442Z Network UPS Tools - Generic HID driver 0.47 (2.8.0)
2023-06-04T14:39:03.609229914Z Using subdriver: Phoenixtec/Liebert HID 0.41
2023-06-04T14:39:03.609278764Z USB communication driver (libusb 1.0) 0.43
2023-06-04T14:39:03.617037196Z Network UPS Tools upsd 2.8.0
2023-06-04T14:39:03.617104686Z Ignoring invalid pid number 0
2023-06-04T14:39:03.617393778Z listening on 0.0.0.0 port 3493
2023-06-04T14:39:03.617778073Z Connected to UPS [tatooine-ups]: usbhid-ups-tatooine-ups
2023-06-04T14:39:03.623763083Z Network UPS Tools upsmon 2.8.0
2023-06-04T14:39:03.623808699Z    0.000000  Ignoring invalid pid number 0
2023-06-04T14:39:03.624036619Z    0.000197  UPS: tatooine-ups@localhost (primary) (power value 1)
2023-06-04T14:39:03.624061594Z    0.000247  [D1] debug level is '1'
2023-06-04T14:39:03.624076525Z    0.000259  [D1] debug level is '1'
2023-06-04T14:39:03.624632712Z    0.000810  [D1] Saving PID 18 into /var/run/upsmon.pid
2023-06-04T14:39:03.625282309Z Init SSL without certificate database
2023-06-04T14:39:03.627546847Z    0.003691  [D1] Trying to connect to UPS [tatooine-ups@localhost]
2023-06-04T14:39:03.628238592Z    0.004423  [D1] Logged into UPS tatooine-ups@localhost

And everything works fine till ~1 day (the same after restarting container) image

I'm using pollonly=true in config

jimklimov commented 1 year ago

In an earlier post version I saw a graph from your switch going from some 5Mbit/s to zero and intermittent connectivity, I think - did it reflect the loss of networking in the container (or its host)? E.g. trying to point fingers that the problem is not in NUT itself? ;)

jimklimov commented 1 year ago

Generally, what do the NUT logs say around the time the data disappears?

I assume you have some single script running the driver, upsd and upsmon in the same container - so loss of connectivity between them over localhost link would be peculiar. And some graphing client separately, which loses the data stream (but this may be due to various causes)?

hajkenek commented 1 year ago

It was wrong screenshot accidently copied from my clipboard. Totally unrelated to main topic. I tried to delete this, but probably it's somewhere in assets. All logs that I have are pasted above. So looks like nothing change. After restarting nut container I see new data again for about 1 day.

jimklimov commented 1 year ago

Ok, understood about the graphs. Anything new about the logs? Do NUT daemons complain about anything when you no longer see data updates? Are there any limits for the container itself (lifetime, CPU usage time etc.)? Can you query it with e.g. a remote upsc client when graphing does not work?

Some straws to help separate where the issue is and is-not?.. :)

hajkenek commented 1 year ago

Nothing interesting for now. Disconnecting and connecting USB does nothing. There is no new logs. Container usage looks normal. Now I'm trying to retrive some data from nut server somehere else to be 100% sure if this is related to nut server not nut client on my homeassistant instance.

hajkenek commented 1 year ago

Okay some progress: Data on other machine in local network are stuck at the same values as homeassistant: image

Soo definitely nut server issue not client. Looks like the server is sending data, but data are outdated (stuck on last known values). As I said: reconnecting usb doesn't change anything, so this isn't physical connection issue. After rebooting container with nut server everything works for another day. Is there any better logging that I can turn on?

If I delete pullonly from config, after 1 day there are no data (instead last know value I see only blank values)

jimklimov commented 1 year ago

Ok, so mentioning Home Assistant is a useful clue :)

I do not have direct experience with it, but some references from NUT Wiki and earlier discussions might help...

As for logging, with NUT 2.8.0 and newer (if you try custom builds) the debug_min NUM setting should be supported in daemons (drivers, upsd, upsmon) config files to set their verbosity levels; practical values are up to 6. With earlier NUT versions, you are limited to providing many -D arguments to launching command line.

UPDATE: Added https://github.com/networkupstools/nut/wiki/Changing-NUT-daemon-debug-verbosity to document the nuances.

hajkenek commented 1 year ago

Okay, so with debug set up to 6, I have some new logs. After loosing data I'm getting this. It is look like some of usb issue for me, but idk. (I deleted repeated values, and try to left only importat logs, because there are so many lines :P, if you need full log, just tell me.

3470.741067 [D2] pollups: tatooine-ups@localhost <------ this is exact moment of loosing data
3470.741117 [D3] get_var: tatooine-ups@localhost / status
3470.741729 Poll UPS [tatooine-ups@localhost] failed - Data stale
3470.741773 [D2] do_notify: ntype 0x0005 (COMMBAD)
3470.741804 Communications with UPS tatooine-ups@localhost lost
3470.742350 [D3] Current power value: 1
3470.742403 [D3] Minimum power value: 1
sh: wall: not found
Warning: no shutdown command defined!
3475.742727 [D2] pollups: tatooine-ups@localhost
3475.742778 [D3] get_var: tatooine-ups@localhost / status
3475.743444 Poll UPS [tatooine-ups@localhost] failed - Data stale
3765.802266 [D2] pollups: tatooine-ups@localhost
3765.802315 [D3] get_var: tatooine-ups@localhost / status
3765.802897 Poll UPS [tatooine-ups@localhost] failed - Data stale
3765.802945 [D2] do_notify: ntype 0x0008 (NOCOMM)
3765.802966 UPS tatooine-ups@localhost is unavailable
5296.118412 [D2] pollups: tatooine-ups@localhost
5296.118465 [D3] get_var: tatooine-ups@localhost / status
5296.119133 Poll UPS [tatooine-ups@localhost] failed - Data stale
5296.119177 [D3] Current power value: 1
5296.119196 [D3] Minimum power value: 1
5301.119552 [D2] pollups: tatooine-ups@localhost
5301.119602 [D3] get_var: tatooine-ups@localhost / status
5301.120262 Poll UPS [tatooine-ups@localhost] failed - Driver not connected
5301.120301 [D3] Current power value: 1
5301.120320 [D3] Minimum power value: 1
5306.120665 [D2] pollups: tatooine-ups@localhost
5306.120732 [D3] get_var: tatooine-ups@localhost / status
5306.121336 [D2] parse_status: [WAIT]
5306.121427 [D2] do_notify: ntype 0x0004 (COMMOK)
5306.121456 Communications with UPS tatooine-ups@localhost established
5306.121989 [D3] parsing: [WAIT]
5306.122054 [D3] Current power value: 1
5306.122078 [D3] Minimum power value: 1
sh: wall: not found
Warning: no shutdown command defined!
5311.122250 [D2] pollups: tatooine-ups@localhost
5311.122307 [D3] get_var: tatooine-ups@localhost / status
5311.122889 [D2] parse_status: [WAIT]
5311.122932 [D3] parsing: [WAIT]
5311.122959 [D3] Current power value: 1
5311.122977 [D3] Minimum power value: 1
5316.123128 [D2] pollups: tatooine-ups@localhost
5316.123178 [D3] get_var: tatooine-ups@localhost / status
5316.123723 [D2] parse_status: [WAIT]
5316.123761 [D3] parsing: [WAIT]
5316.123790 [D3] Current power value: 1
5316.123816 [D3] Minimum power value: 1
5321.124166 [D2] pollups: tatooine-ups@localhost
5321.124226 [D3] get_var: tatooine-ups@localhost / status
5321.124856 Poll UPS [tatooine-ups@localhost] failed - Data stale
5321.124917 [D2] do_notify: ntype 0x0005 (COMMBAD)
5321.124948 Communications with UPS tatooine-ups@localhost lost
5321.125578 [D3] Current power value: 1
5321.125635 [D3] Minimum power value: 1
sh: wall: not found
Warning: no shutdown command defined!
5326.125850 [D2] pollups: tatooine-ups@localhost
5326.125910 [D3] get_var: tatooine-ups@localhost / status
5326.126575 Poll UPS [tatooine-ups@localhost] failed - Data stale
jimklimov commented 1 year ago

Thanks, that seems like upsmon log. Is there one from driver and/or upsd?

hajkenek commented 1 year ago

No more logs, I set up all three config files with this line: image

Maybe I am doing something wrong?

jimklimov commented 1 year ago

Well, by config it seems there should be more logs. Maybe in syslog sink or just anonther log/journal file?