Closed jimklimov closed 6 months ago
Easily reproduced by running a working setup, watching the logs, and stopping a driver. With DEBUG_MIN 6
in upsmon.conf
:
### Monitoring OK
Apr 06 19:23:01 pve nut-monitor[3127289]: 15.002685 [D4:3127289] ups_on_line: eco650 (no change)
Apr 06 19:23:01 pve nut-monitor[3127289]: 15.002687 [D3:3127289] Current power value: 1
Apr 06 19:23:01 pve nut-monitor[3127289]: 15.002689 [D3:3127289] Minimum power value: 1
### Stopping driver
Apr 06 19:23:06 pve nut-monitor[3127289]: 20.002761 [D2:3127289] pollups: eco650
Apr 06 19:23:06 pve nut-monitor[3127289]: 20.002782 [D3:3127289] get_var: eco650 / status
Apr 06 19:23:06 pve nut-monitor[3127289]: 20.002867 [D3:3127289] pollups: Poll UPS [eco650] after getvar(status) failed: upserror=39
Apr 06 19:23:06 pve nut-monitor[3127289]: 20.002873 Poll UPS [eco650] failed - Driver not connected
Apr 06 19:23:06 pve nut-monitor[3127289]: 20.002877 [D2:3127289] do_notify: ntype 0x0005 (COMMBAD)
Apr 06 19:23:06 pve nut-monitor[3127289]: 20.002887 [D6:3127289] notify: sending notification for [eco650]: type COMMBAD with flags 0x000e: Communications with UPS eco650 lost
Apr 06 19:23:06 pve nut-monitor[3127289]: 20.002890 [D6:3127289] notify: NOTIFY_SYSLOG (as LOG_NOTICE)
Apr 06 19:23:06 pve nut-monitor[3127289]: 20.002893 Communications with UPS eco650 lost
Apr 06 19:23:06 pve nut-monitor[3127289]: 20.003070 [D6:3127289] notify (parent): forked a child to notify via subprocesses
Apr 06 19:23:06 pve nut-monitor[3127289]: 20.003080 [D3:3127289] Current power value: 1
Apr 06 19:23:06 pve nut-monitor[3127289]: 20.003085 [D3:3127289] Minimum power value: 1
Apr 06 19:23:06 pve nut-monitor[3134816]: 20.003118 [D6:3134816] notify (child): forked to notify via subprocesses
Apr 06 19:23:06 pve nut-monitor[3134816]: 20.003137 [D6:3134816] notify (child): NOTIFY_WALL
Apr 06 19:23:06 pve nut-monitor[3134816]: 20.005198 [D6:3134816] notify (child): NOTIFY_EXEC: calling NOTIFYCMD as '/usr/local/ups/bin/notifyme "Communications with UPS eco650 lost"'
Apr 06 19:23:06 pve nut-monitor[3134816]: Network UPS Tools upsmon 2.8.2-22-g60f76bfa9
### Next loop
Apr 06 19:23:11 pve nut-monitor[3127289]: 25.003157 [D2:3127289] pollups: eco650
Apr 06 19:23:11 pve nut-monitor[3127289]: 25.003171 [D3:3127289] get_var: eco650 / status
Apr 06 19:23:11 pve nut-monitor[3127289]: 25.003246 [D3:3127289] pollups: Poll UPS [eco650] after getvar(status) failed: upserror=39
Apr 06 19:23:11 pve nut-monitor[3127289]: 25.003250 Poll UPS [eco650] failed - Driver not connected
Apr 06 19:23:11 pve nut-monitor[3127289]: 25.003252 [D3:3127289] Current power value: 1
Apr 06 19:23:11 pve nut-monitor[3127289]: 25.003254 [D3:3127289] Minimum power value: 1
NOTE: Further debugging aid via envvar:
# cat /etc/systemd/system/nut-monitor.service.d/debug.conf
[Service]
Environment="NUT_DEBUG_PID=yes"
After adding some more debug messages, it seems that the banner appears as part of exit()
handling for parent and child processes alike, although it may be not in upsmon.c
itself -- don't see any atexit()
registrations. Checking open_syslog()
etc...
Notification child:
:; systemctl stop nut-driver@eco650
...
Apr 06 19:58:33 pve nut-monitor[3914328]: 410.019965 [D6:3914328] notify (child): NOTIFY_EXEC: calling NOTIFYCMD as '/usr/local/ups/bin/notifyme "Communications with UPS eco650 lost"'
Apr 06 19:58:33 pve nut-monitor[3914328]: 410.022328 [D6:3914328] notify (child): exiting after notifications
Apr 06 19:58:33 pve nut-monitor[3914328]: Network UPS Tools upsmon 2.8.2-24-g962c15437
Daemon parent:
:; systemctl stop nut-monitor
...
Apr 06 20:00:30 pve nut-monitor[3839029]: 527.663796 Signal 15: exiting
Apr 06 20:00:30 pve nut-monitor[3839029]: 527.663814 [D6:3839029] upsnotify: notify about state 4 with libsystemd: use sd_notify()
Apr 06 20:00:30 pve nut-monitor[3839029]: 527.663819 [D6:3839029] upsnotify: posting sd_notify: STATUS=Signal 15: exiting
Apr 06 20:00:30 pve nut-monitor[3839029]: STOPPING=1
Apr 06 20:00:30 pve nut-monitor[3839029]: 527.663844 [D2:3839029] Dropping connection to UPS [eco650], last seen as fully online.
Apr 06 20:00:30 pve nut-monitor[3839029]: 527.663844 [D2:3839029] Finally exiting due to signal 15
Apr 06 20:00:30 pve nut-monitor[3839029]: Network UPS Tools upsmon 2.8.2-24-g962c15437
Apr 06 20:00:30 pve systemd[1]: Stopping Network UPS Tools - power device monitor and shutdown controller...
Apr 06 20:00:30 pve systemd[1]: nut-monitor.service: Succeeded.
Apr 06 20:00:30 pve systemd[1]: Stopped Network UPS Tools - power device monitor and shutdown controller.
And this seems to happen for other NUT daemon exits, e.g.:
:; systemctl stop nut-driver@eco650
...
Apr 06 20:08:32 pve systemd[1]: Stopping Network UPS Tools - device driver for NUT device 'eco650'...
Apr 06 20:08:32 pve nut-driver@eco650[4020967]: Signal 15: exiting
Apr 06 20:08:32 pve usbhid-ups[4020967]: Signal 15: exiting
Apr 06 20:08:33 pve nut-driver@eco650[4133136]: fopen /var/state/ups/usbhid-ups-eco650.pid: No such file or directory
Apr 06 20:08:33 pve nut-driver@eco650[4133136]: Network UPS Tools - UPS driver controller 2.8.2-24-g962c15437
Apr 06 20:08:33 pve systemd[1]: nut-driver@eco650.service: Succeeded.
Apr 06 20:08:33 pve systemd[1]: Stopped Network UPS Tools - device driver for NUT device 'eco650'.
Just beautiful! The message did come from main()
initialization of the daemons. It was also stuck in the buffer (at least for non-interactive logging) and was cloned into forked child processes, so every exit()
called a fflush()
of open file descriptors while closing them - and that got the buffered message actually printed.
One give-away was that the banner was not actually seen in systemd journals when the service started.
Example from systemd journal:
So it is every 5 minutes (corresponding to default NOCOMMWARNTIME) and
notify()
doesfork()
forNOTIFYCMD
andpopen()
forwall()
; however neither of these should print theNetwork UPS Tools upsmon 2.8.0-Windows-313-ga3fb36942
banner which I only see inmain()
:\Systemd unit state (uptime) remains uninterrupted.
In greater detail around the notification event: