Closed ullix closed 1 year ago
Ugh, did you read the docs on NUT architecture and setup? :) Some hints:
upsmon
error seems to imply that you do not currently have the upsmon
daemon running locally. The -c
is a way for the newly launched binary to effectively find its copy already running (via PID file) on the "master"/"primary" system and send it a signal to trigger the shutdown (tell "secondary" systems to go down, then shut down itself, and if the late-shutdown integration works for your distro, tell the UPS to power off in the end). If the service is not running, then there's nobody to signal to :)Gosh, I think I read until my eyes fell off. I must say I find it poorly written in many aspects, and not giving some kind of summary for a really interested user, and I see no tools to tell me whatever may be missing.
You are trying to continue this by providing "hints"! Hell, why can't you just tell me what commands I need for my browser? Please!
I know what the fsd flag is meant for. I just see no obvious way to test my installation for correctness, and GUIDANCE what to do about it.
I had nut running years ago, and as it went flawless, I forgot about the details. Now I need to reestablish it on new systems. So, help is needed, not hints. Thank you.
Sorry about that, I mostly post answers while commuting so don't have good access to docs or to a system for modeling the setup, just the memory. Hence the "hints". For summaries there are many blogs that do a decent job. PRs are also welcome - uncomfortable docs are really something that bypassers notice better than people who see them for years and "get used" to them :\ Also the harder-core techie style apparently served well 15-20 years ago when most of this was written, and today more people want words minced on a platter.
And to "browse" you do need to set up a web-server, enable CGI support, and make it use NUT's *.cgi programs to serve them on the web. Most of this is non-NUT business and depends on the web server you use (apache, nginx, etc). Some of this has examples in https://github.com/networkupstools/nut/blob/master/data/html/README
The NUT-specific part is to configure /etc/nut/upsd.users file, host.conf, upsset.conf as listed at e.g. https://networkupstools.org/docs/user-manual.chunked/ar01s02.html#_cgi_programs
"Testing for correctness" generally goes to set up the NUT driver and data server on same system (you've got that working according to upsc
responding in original post), and set up upsd.users and upsmon.conf to play together for access. Then you temporarily set SHUTDOWNCMD to touch some file (so you know upsmon tried to do its job), pull the plug and wait for low-battery status and check that the fake-shutdown was triggered.
Then with fsd you can try a real one, to check your UPS accepts a command to cut the electricity. Depending on OS you may require to barge into late shutdown scripts to avoid a "power race condition" (just reboot if wall power returned while you were shutting down, or stay up to drain the UPS and so powercycle it if it is not too manageable to power off when told to).
Good Lord, had it been that complicated using a web browser some years ago? I did use it, but don't recollect any need for so much action. I think I pass on it until the rest is solved.
The rest is this:
~$ sudo systemctl restart nut-server
~$ sudo systemctl restart nut-monitor
~$ sudo upsd -c reload
Network UPS Tools upsd 2.8.0
fopen /run/nut/upsd.pid: No such file or directory
~$ sudo upsmon -c reload
Network UPS Tools upsmon 2.8.0
fopen /run/nut/upsmon.pid: No such file or directory
~$ sudo upsmon -c fsd
Network UPS Tools upsmon 2.8.0
fopen /run/nut/upsmon.pid: No such file or directory
~$ upsc eaton@10.0.0.51 ups.model
Init SSL without certificate database
Ellipse PRO 650
This is the complete output of the commands given in that order. No error message of any kind after the systemctl calls.
But upsd and upsmon fail, yet upsc works correctly. What gives?
P.S. Thank you for the thorough answer!
Looking at recent discussions about Fedora's failed packaging, largely in the area of PID-file path setting, I wonder if your setup got similarly messed up (and notably there was a problem fixed in master branch just recently about configuring systemd-tmpfiles for similar issues) :\
Do you use a custom build of NUT or one packaged by distro?
Can you please check if:
/run/nut/nut
exists with proper permissions for the NUT run-time user?/run/nut
exist at all, is it owned by root or a NUT user?/run
(possibly at best a root-started upsmon
might make one)/var/state/ups
(possibly those of upsd and/or drivers)My guess here is that either the programs trying to "command" their already running daemon instances are looking for a PID file in the wrong location (several reasons for "wrong" are possible), or that the daemons are trying to write into an absent directory (e.g. systemd-tmpfiles was never configured for NUT) or one they do not have rights to write into (may be related to that recently fixed issue).
It could help to add a few -D
arguments for higher debug verbosities (up to 6) to track where the programs and daemons try to write their PID files, etc. Since NUT 2.8.0 you can also use a debug_min
config-file option for drivers and upsd, to avoid changing init-scripts or systemd units; I am not sure OTOH if there was an equivalent made for upsmon
.
I am using Linux Mint-Mate 21. All nut packages from its repositories.
Attach file should have all answers to your questions.
Thanks.
the word "pip" is not present in any of these files by case insensitive search: FWIW, "pid" not "pip" ;)
Seeing this:
/run/nut:
total 4
drwxrwx--- 2 root nut 80 Nov 28 13:46 .
drwxr-xr-x 42 root root 1340 Nov 30 08:48 ..
srw-rw---- 1 nut nut 0 Nov 28 13:46 usbhid-ups-eaton
-rw-r--r-- 1 nut nut 5 Nov 28 13:46 usbhid-ups-eaton.pid
...it seems the default STATEPATH
and ALTPIDPATH
built into the package is (/var?)/run/nut
and owned correctly. So at least if upsd
(nut-server.service
if systemd is involved) should have left its PID file here. So it may have failed to start. If you first started it before configuring ups.conf
with device sections, it could have refused to run (there is an option to start without devices and wait for "reload" to slurp new configurations, but it is not default and destined for automated mass-monitoring systems per #766).
Good point raised in a different issue's discussion: lack of PID files for signaling may be due to "foregrounding" of the daemons under systemd units, to avoid "extra forking" and more difficult child-process tracking - newly with 2.8.0 release. In their logs:
Nov 30 19:06:53 mythtv.billgee.local nut-server[28918]: Running as foreground process, not saving a PID file
Nov 30 19:06:53 mythtv.billgee.local upsd[28918]: Running as foreground process, not saving a PID file
This might explain the successful unit startup vs. lack of PID files for upsd
(nut-server
). There is actually a upsd -FF
(dual F) option support just for that, to save the PID file anyway. Not sure which one the systemd units in that packaging use.
As for upsmon
however, it should always save the PID file (note: for unprivileged child half, if it is running in a dual-process model -- splitting root bit for shutdowns vs unprivileged for most of lifetime).
And looking up some more, the systemd unit integration actually says as much: https://github.com/networkupstools/nut/blob/master/scripts/systemd/nut-server.service.in#L23-L26
So systemctl reload nut-server
should do the right thing (using $MAINPID
as tracked by systemd) and would be idiomatically correct - managing a service completely by one framework, without confusing impacts outside its control.
;-) sorry for confusing with the typo. I did in fact search for pid, and verified again it is not there. Pip is found in a few pipe*.
The /var/run/ exists, but is a link to /run/.
I am not fully following your arguments. In the file nut.conf
file is this section
#~ ALLOW_NO_DEVICE=true
#~ export ALLOW_NO_DEVICE
which I had un-commented at some point - as it sounded benign - but later re-commented. If this were the problem maker, could I recover by un-commenting again? Or what else would I have to do?
Tried it:
~$ sudo systemctl reload nut-server
~$ sudo upsmon -DDDD -c reload
Network UPS Tools upsmon 2.8.0
0.000000 fopen /run/nut/upsmon.pid: No such file or directory
~$ sudo upsd -DDDD -c reload
Network UPS Tools upsd 2.8.0
0.000000 fopen /run/nut/upsd.pid: No such file or directory
no change.
As long as you have the device configured in ups.conf
(the eaton
entry per earlier posts), this ALLOW_NO_DEVICE
should not be needed (upsd
may start because it already knows devices to represent), nor would it hurt. It just would not kick in :)
I was rather wondering if it could be a reason for nut-server
to not start up initially (might be among reasons for why you had no PID file).
What did systemctl status nut-server
or a possibly more detailed journalctl -lu nut-server
show after systemctl reload
?
Note: with that systemd unit remaining as is, the PID file won't appear and upsd -c reload
verbatim would still not work. The unit would use upsd -c reload -P $PID
arguments to tell that daemon process to reload. You can change to start upsd -FF
in the nut-server.service
definition and systemctl daemon-reload; systemctl restart nut-server
to get that into effect and have it save a PID file too.
For that matter, do systemctl status nut-monitor
or journalctl -lu nut-monitor
expose any faults?
Uiiih, lots of failure reports, see attached file
I notice in line 14: user monuser not found
. This user thing has confused me a lot. I expected monuser to be known only to nut, or is it a Linux user? And I think a Linux-user 'nut' was also used somewhere.
It also didn't help that this username is switched between monuser
and upsmon
, like in file upsmon.conf
(line ~73):
# [upsmon]
# password = blah
# upsmon primary # (or secondary)
To me this also looks like [upsmon]
defines the name, but in upsmon primary
it defines something else?
Anyway, my settings are:
in upsmon.conf:
RUN_AS_USER monuser
MONITOR eaton@10.0.0.51 1 monuser mypass primary
in upsd.users:
[monuser]
password = mypass
upsmon primary
actions = SET
instcmds = ALL
That's on to something... So indeed, the RUN_AS_USER
refers to the OS (Linux) account, e.g. nut
who owns those directories for PID files. Apparently upsmon
fails to become_user()
for the unprivileged part and so fails to start => no PID file and no daemon.
The monuser
matched in both upsd.users
section name AND in MONITOR
line of upsmon.conf
is the NUT-defined account for monitoring with allowed role upsmon primary
(so persistent connection and certain messages that may be exchanged).
I'll check if the docs suggest mismatched example names, that would be unfortunate.
Looking at the status log above, for the past day the nut-monitor
was not running (because "monuser not found"), and for some attempts before that (since Nov 26 18:23:06). Previously it did see the changes in UPS state, so the data-server and driver also ran. And the FSD test at Nov 26 17:55:42 apparently succeeded, with a real shutdown (of the OS at least).
Snailing forward:
~$ sudo systemctl restart nut-server
~$ sudo systemctl restart nut-monitor
~$ sudo upsd -DDDD -c reload
Network UPS Tools upsd 2.8.0
0.000000 fopen /run/nut/upsd.pid: No such file or directory
~$ systemctl status nut-monitor
● nut-monitor.service - Network UPS Tools - power device monitor and shutdown controller
Loaded: loaded (/lib/systemd/system/nut-monitor.service; enabled; vendor preset: enabled)
Active: active (running) since Thu 2022-12-01 11:47:56 CET; 5min ago
Main PID: 76225 (upsmon)
Tasks: 2 (limit: 8742)
Memory: 2.5M
CPU: 28ms
CGroup: /system.slice/nut-monitor.service
├─76225 /lib/nut/upsmon -F
└─76226 /lib/nut/upsmon -F
Dec 01 11:47:56 ullix-ProLiant-MicroServer-Gen10 systemd[1]: Started Network UPS Tools - power device monitor and shutdown controller.
Dec 01 11:47:56 ullix-ProLiant-MicroServer-Gen10 nut-monitor[76225]: fopen /run/nut/upsmon.pid: No such file or directory
Dec 01 11:47:56 ullix-ProLiant-MicroServer-Gen10 nut-monitor[76225]: Could not find PID file to see if previous upsmon instance is already running!
Dec 01 11:47:56 ullix-ProLiant-MicroServer-Gen10 nut-monitor[76225]: UPS: eaton@10.0.0.51 (primary) (power value 1)
Dec 01 11:47:56 ullix-ProLiant-MicroServer-Gen10 nut-monitor[76225]: Using power down flag file /etc/killpower
Dec 01 11:47:56 ullix-ProLiant-MicroServer-Gen10 nut-monitor[76226]: Init SSL without certificate database
~$ journalctl -lu nut-monitor
...
Dec 01 11:47:56 ullix-ProLiant-MicroServer-Gen10 systemd[1]: Started Network UPS Tools - power device monitor and shutdown controller.
Dec 01 11:47:56 ullix-ProLiant-MicroServer-Gen10 nut-monitor[76225]: fopen /run/nut/upsmon.pid: No such file or directory
Dec 01 11:47:56 ullix-ProLiant-MicroServer-Gen10 nut-monitor[76225]: Could not find PID file to see if previous upsmon instance is already running!
Dec 01 11:47:56 ullix-ProLiant-MicroServer-Gen10 nut-monitor[76225]: UPS: eaton@10.0.0.51 (primary) (power value 1)
Dec 01 11:47:56 ullix-ProLiant-MicroServer-Gen10 nut-monitor[76225]: Using power down flag file /etc/killpower
Dec 01 11:47:56 ullix-ProLiant-MicroServer-Gen10 nut-monitor[76226]: Init SSL without certificate database
More hope: upsmon did find the pid file, but upsd did not?
~$ sudo upsmon -DDDD -c reload
Network UPS Tools upsmon 2.8.0
~$ sudo upsd -DDDD -c reload
Network UPS Tools upsd 2.8.0
0.000000 fopen /run/nut/upsd.pid: No such file or directory
Each daemon has its own PID file.
Now that upsmon
started successfully (RUN_AS_USER
fixed back?) it wrote the file and so can receive commands at the PID recorded there.
The upsd
would not record the PID file unless you change nut-server.service
to ExecStart=.../upsd -FF
-- and you might never need to change that at all, if you would systemctl reload nut-server
instead. Not that you should do that often...
If the new nut-driver-enumerator.(path|service)
is enabled on your system, edits of ups.conf
would be picked up to dynamically (re-)define nut-driver@eaton.service
("eaton" being the section name), and (re-)start/reload needed services.
I've also posted a PR #1724 where I first thought to "fix" the systemd units, but ended up rather documenting why the current situation makes sense (for daemons managed by OS service framework), and hopefully made it better debugable in the field - with more actionable error messages.
Yes, fixed back; I have now:
in upsmon.conf:
RUN_AS_USER nut
but still failure:
$ systemctl reload nut-server
~$ sudo upsd -DDDD -c reload
Network UPS Tools upsd 2.8.0
0.000000 fopen /run/nut/upsd.pid: No such file or directory
~$ sudo upsmon -DDDD -c reload
Network UPS Tools upsmon 2.8.0
I've posted probable reasons. I don't have info if your nut-server
service currently runs the daemon as upsd -FF
and so did/tried-to save a PID file and we now fail to use it, or never tried to save it (as upsd -F
by systemd unit default).
The lines logged in systemd status nut-server
may show if preceding systemd reload nut-server
reported any errors (using the $MAINPID
number directly, without the file).
Also don't know if you need to reload
these daemons this way all day long in practice (vs. experimentation), so whether the remaining point is moot? ;)
I have no idea what you mean. With this upsd -FF
the results are:
~$ sudo upsd -FF
Network UPS Tools upsd 2.8.0
fopen /run/nut/upsd.pid: No such file or directory
Could not find PID file '/run/nut/upsd.pid' to see if previous upsd instance is already running!
not listening on 10.0.0.51 port 3493
no listening interface available
You said you start it as nut-server
systemd unit. That in turn starts upsd
with some arguments (I suppose -F
so no PID file).
Now above you started another copy of the daemon, which failed to listen on the already-occupied port 3493 I guess.
Ok, I get this. But if the -FF option is important to write a pid file then how can I start upsd with double F?
Or, the reverse: when you programmers decided to make a default start for single F, which does NOT write a pid file, then isn't it reasonable to assume that this pid file is not needed at all? But then why made you upsd
continuing to complain about missing something that was intentionally left out?
How ... ?
That is a non-NUT question but rather one of managing your OS of choice and its ways of starting services ;P
Technically, for a quick fix you can look up the systemd unit configuration file used in your system and edit it directly (*):
$ systemctl status nut-server
...
Loaded: loaded (/lib/systemd/system/nut-server.service; enabled; vendor preset: enabled)
...
$ sudo vi /lib/systemd/system/nut-server.service
> ExecStart=...
$ sudo systemctl daemon-reload
$ sudo systemctl restart nut-server
(*) Note that this quick-edit is practical, but prone to overwrites by packaging when your OS is upgraded and new NUT build lands. Systemd supports several locations to customize "drop-in" amendments to unit definitions (by packaging, site-local, run-time, etc.) so you could use that:
:; sudo mkdir -p /etc/systemd/system/nut-server.service.d
:; sudo vi /etc/systemd/system/nut-server.service.d/use-pid.conf
[Service]
# Copy path from your packaged definition
ExecStart=.../upsd -FF
:; sudo systemctl daemon-reload
:; sudo systemctl restart nut-server
Why ... ?
May be attributed to human error :) and lack of that use-case in one's practice. Conversely, may be deemed intentional, as now strongly implied by PR #1724 - if you (or your distro packagers) set up the system entities to be managed by an official framework like systemd, then people should not poke sticks into the wheels by directly sending signals and looking at PID files. It may work for indefinite future, but is plain untidy to fully support two potentially opposing approaches at the same time. At least, now the error messages would be indicative of that - if you use "systemd" for the services, do so throughout:
$ ./server/upsd -c reload
Network UPS Tools upsd 2.8.0-175-g3d5537378
fopen /var/state/ups/upsd.pid: No such file or directory
Try 'systemctl reload nut-server.service' or add '-P $PID' argument
upsd
call systemctl reload upsd
in such situation, and/or query it for MAINPID value. The core problem is (lack of) consistent OS setup for the goals, so should be addressed as such.Note there are still systems without systemd or SMF which would use the decades-old approaches with init-scripts and PID files. Also on the deeper technical background side, upsd -c something
means that your new upsd
program instance signals the running daemon instance - so one way or another, should find it (currently by PID file or explicit -P NUMBER
argument). Hence complaints when it can not fulfill your request.
All this hassle is practically really needed only if you expect to edit config files which impact upsd
(ups.conf
with new/deleted device entries, upsd.conf
, upsd.users
) and to reload it "live" as opposed to just restarting the service with a couple of seconds of its downtime. In case of driver definition addition/removal, different ways are available including nut-driver-enumerator
(service and script) to define nut-driver instances and reload upsd, or to manually run systemctl reload nut-server
instead of direct upsd -c reload
.
Also note that the original problem was "missing upsmon.pid" and eventually that one got solved, and you can send direct upsmon -c fsd
signals which do have a practical use :)
I really have no inclination of fiddling with the start-up files of any package, though it is great to get some insight into the complexity of it :-).
But what all this means is that the error message upsd
about missing pid should just be ignored, because the lack of this pid is intentional? Please, have some mercy with willing users, who need as little as possible confusion, and remove the message when it is not needed.
Back to upsmon. Why has it lost the pid on the second call?
~$ sudo systemctl restart nut-monitor.service
~$ sudo upsmon -c reload
Network UPS Tools upsmon 2.8.0
~$ sudo upsmon -c reload
Network UPS Tools upsmon 2.8.0
fopen /run/nut/upsmon.pid: No such file or directory
~$ sudo upsmon -c reload
Network UPS Tools upsmon 2.8.0
fopen /run/nut/upsmon.pid: No such file or directory
~$ sudo systemctl restart nut-monitor.service
~$ sudo upsmon -c reload
Network UPS Tools upsmon 2.8.0
~$ sudo upsmon -c reload
Network UPS Tools upsmon 2.8.0
fopen /run/nut/upsmon.pid: No such file or directory
Well, digging deep was useful for you and for the project, uncovering more rough edges buried in "accustomedness" :)
Partial "problem" is that utility commands try to do what they were asked about or log inability to do that, without a context that inability could be expected. In some cases this is addressed by suffixing another message:
Generally the user may know (or dig to discover) the problematic situation - e.g. if you end up with two upsd
instances running and conflicting (like you had with inability to start due to no free port to listen on). There may be myriad bad scenarios, so different bread-crumbs add up to a diagnosis.
Ran some experiments detailed in https://github.com/networkupstools/nut/issues/1728#issuecomment-1335371798 and as much as I tried, I could not reproduce the problem in plain command line, with upsmon
(mis?-)behavior alone.
So far tending to blame systemd for the mishaps... Could you please try to reproduce the issue again and post the systemd unit's journal?
:; sudo journalctl -flu nut-monitor & PIDJ=$! ; sleep 1
:; (set -x; sudo systemctl restart nut-monitor.service ; sleep 5 ; sudo upsmon -c reload ; sleep 5; sudo upsmon -c reload ; sleep 5 )
:; kill $PIDJ
I wonder if systemd detects an incoming signal to monitored process and somehow toxically processes it (e.g. deletes the PID file it claims to have problem monitoring)?..
Hm, also just noticed that in your tests you were doing systemctl restart nut-monitor.service
(not reload
) - did you check if the daemon was actually running after that?
Maybe it failed soon but not instantly after start, or did not even exit yet, so handled the signal first time if delay was short (although it does not remove the PID file on its own anyway)?.. Just guessing and grasping at straws :)
So, the first request:
~$ sudo journalctl -flu nut-monitor & PIDJ=$! ; sleep 1
[1] 98550
[1]+ Stopped sudo journalctl -flu nut-monitor
~$ (set -x; sudo systemctl restart nut-monitor.service ; sleep 5 ; sudo upsmon -c reload ; sleep 5; sudo upsmon -c reload ; sleep 5 )
+ sudo systemctl restart nut-monitor.service
[sudo] password for ullix:
+ sleep 5
+ sudo upsmon -c reload
Network UPS Tools upsmon 2.8.0
+ sleep 5
+ sudo upsmon -c reload
Network UPS Tools upsmon 2.8.0
fopen /run/nut/upsmon.pid: No such file or directory
+ sleep 5
~$ kill $PIDJ
~$
and this is: journalctl -lu nut-monitor
for today only:
Dec 03 09:55:10 ullix-ProLiant-MicroServer-Gen10 systemd[1]: Started Network UPS Tools - power device monitor and shutdown controller.
Dec 03 09:55:10 ullix-ProLiant-MicroServer-Gen10 nut-monitor[98557]: fopen /run/nut/upsmon.pid: No such file or directory
Dec 03 09:55:10 ullix-ProLiant-MicroServer-Gen10 nut-monitor[98557]: Could not find PID file to see if previous upsmon instance is already running!
Dec 03 09:55:10 ullix-ProLiant-MicroServer-Gen10 nut-monitor[98557]: UPS: eaton@10.0.0.51 (primary) (power value 1)
Dec 03 09:55:10 ullix-ProLiant-MicroServer-Gen10 nut-monitor[98557]: Using power down flag file /etc/killpower
Dec 03 09:55:10 ullix-ProLiant-MicroServer-Gen10 nut-monitor[98559]: Init SSL without certificate database
Dec 03 09:55:10 ullix-ProLiant-MicroServer-Gen10 nut-monitor[98559]: UPS eaton@10.0.0.51: forced shutdown in progress
Dec 03 09:55:10 ullix-ProLiant-MicroServer-Gen10 nut-monitor[98559]: Executing automatic power-fail shutdown
Dec 03 09:55:10 ullix-ProLiant-MicroServer-Gen10 nut-monitor[98564]: wall: /dev/pts/4: No such file or directory
Dec 03 09:55:10 ullix-ProLiant-MicroServer-Gen10 nut-monitor[98561]: Network UPS Tools upsmon 2.8.0
Dec 03 09:55:10 ullix-ProLiant-MicroServer-Gen10 nut-monitor[98565]: wall: /dev/pts/4: No such file or directory
Dec 03 09:55:10 ullix-ProLiant-MicroServer-Gen10 nut-monitor[98559]: Auto logout and shutdown proceeding
Dec 03 09:55:10 ullix-ProLiant-MicroServer-Gen10 nut-monitor[98568]: wall: /dev/pts/4: No such file or directory
Dec 03 09:55:10 ullix-ProLiant-MicroServer-Gen10 nut-monitor[98566]: Network UPS Tools upsmon 2.8.0
Dec 03 09:55:15 ullix-ProLiant-MicroServer-Gen10 nut-monitor[98559]: Network UPS Tools upsmon 2.8.0
Dec 03 09:55:15 ullix-ProLiant-MicroServer-Gen10 nut-monitor[98557]: Network UPS Tools upsmon 2.8.0
Dec 03 09:55:15 ullix-ProLiant-MicroServer-Gen10 systemd[1]: nut-monitor.service: Deactivated successfully.
... did you check if the daemon was actually running after that?
much of this stuff is beyond my pay-grade ;-) . Have mercy and be specific: how do I check for the daemon running ?
That is what I thought you might have meant:
~$ systemctl restart nut-monitor.service
~$ upsc eaton@10.0.0.51
Init SSL without certificate database
battery.charge: 100
battery.charge.low: 20
...
This gets weirder by the minute. I did (full output, no other commands in between:
~$ sudo systemctl restart nut-monitor.service
~$ upsmon -c reload
Network UPS Tools upsmon 2.8.0
kill: Operation not permitted
~$ upsmon -c reload
Network UPS Tools upsmon 2.8.0
fopen /run/nut/upsmon.pid: No such file or directory
~$
Where did the kill come from? And not seen in this terminal, but in one on another computer, ssh connected to the nut server computer:
ullix@ullix-ProLiant-MicroServer-Gen10:~$
Broadcast message from nut@ullix-ProLiant-MicroServer-Gen10 (somewhere) (Sat De
UPS eaton@10.0.0.51: forced shutdown in progress
Broadcast message from nut@ullix-ProLiant-MicroServer-Gen10 (somewhere) (Sat De
Executing automatic power-fail shutdown
Broadcast message from nut@ullix-ProLiant-MicroServer-Gen10 (somewhere) (Sat De
Auto logout and shutdown proceeding
Unfortunately, everything beyond 80 chars is cut-off, so no date and time visible (is this intended by nut, or do I have a setting which makes the cut-off?)
To be clear: I have NOT given a command with the -c fsd
option, and my ups is running fine at 100% charge, and full line power. In upsmon.conf
I have:
SHUTDOWNCMD "touch /home/ullix/SHUTDOWNCMD.msg"
POWERDOWNFLAG /etc/killpower
# NOTIFYCMD /bin/notifyme
NOTIFYCMD /usr/bin/notify-send
The cut-off would be something about OS console settings, NUT should not trim any messages it emits.
So far from those log messages it seems that your earlier experiments with upsmon -c fsd
caused it to raise the forced-shutdown in-memory flag in nut-server
(upsd
) data entry for that UPS. This sort of flag never goes down - server farm shutdowns can not be generally safely aborted at a random point mid-flight, so once they start they must complete and systems be rebooted to reinitialize cleanly. I think you can systemctl restart nut-server
to clear the flag, however.
/etc/killpower
flag file. I believe it is just informative for late-shutdown hooks to know they should try to tell the UPS to go off, but should not intrude otherwise. This file remains until reboot, where legacy init scripts usually removed it before starting NUT daemons - assuming this is a new lifetime after the brownout passed. Not sure right now whether systemd services do similar, or if they should only touch this file in a tmpfs that would be forgotten after reboot.My guess now would be that upsmon
starts (and again, you restarted and not reloaded it in recent posts too); then soon afterwards it detects that FSD must happen, so triggers SHUTDOWNCMD
and exits itself. Not sure who (systemd?) deletes upsmon.pid - but for the second attempt it is not there. Notably, in this particular situation systemctl reload nut-monitor
would not succeed because there is nobody to signal :)
Also:
SHUTDOWNCMD "touch /home/ullix/SHUTDOWNCMD.msg"
Does nut
have rights to write into /home/ullix
? Maybe world-writable /tmp
, /dev/shm
or /var/tmp
would fare better for the tests.
... did you check if the daemon was actually running after that?
much of this stuff is beyond my pay-grade ;-)
Mine too, I do this for free, for fun and experience :)
Have mercy and be specific: how do I check for the daemon running ?
That is what I thought you might have meant:
~$ systemctl restart nut-monitor.service ~$ upsc eaton@10.0.0.51 Init SSL without certificate database battery.charge: 100 battery.charge.low: 20 ...
I meant restarting and checking the same daemon -- something like systemctl restart nut-monitor.service ; sleep 10; systemctl status nut-monitor.service
and whether the latter would say the service is active, running, with such and such PID number - or if it had exited/died shortly after starting (by the time your second attempt usually does not find the PID file).
On a side note, upsc
responding means that two other daemons (NUT driver for this device, and NUT server to represent it on the network) do currently run and communicate.
~$ sudo systemctl restart nut-monitor.service ~$ upsmon -c reload Network UPS Tools upsmon 2.8.0 kill: Operation not permitted
Where did the kill come from?
That attempt probably succeeded in the split second that the upsmon
daemon was recently alive and its PID file existed, so upsmon -c ...
found a process number to signal. By the time it tried to do so, the process was gone so signalling failed. Unix signals are a tack on top of kill
command in libc (one of many signals being the actual "kill"), hence the message.
Does nut have rights to write into /home/ullix? Maybe world-writable /tmp, /dev/shm or /var/tmp would fare better for the tests.
Advice taken, thanks.
So the whole caboodle was due to a left-over fsd flag? If so, then a prominent comment in the docs would be worth while!
In the meantime I have rebooted the server (and renamed the too lengthy hostname, in case you wonder). But it does not seem to have improved; pid file is missing:
ullix@Gen10:~$ systemctl restart nut-monitor.service ; sleep 10; systemctl status nut-monitor.service
● nut-monitor.service - Network UPS Tools - power device monitor and shutdown controller
Loaded: loaded (/lib/systemd/system/nut-monitor.service; enabled; vendor preset: enabled)
Active: active (running) since Sat 2022-12-03 11:41:23 CET; 10s ago
Main PID: 5880 (upsmon)
Tasks: 2 (limit: 8742)
Memory: 832.0K
CPU: 9ms
CGroup: /system.slice/nut-monitor.service
├─5880 /lib/nut/upsmon -F
└─5882 /lib/nut/upsmon -F
Dec 03 11:41:23 Gen10 systemd[1]: Started Network UPS Tools - power device monitor and shutdown controller.
Dec 03 11:41:23 Gen10 nut-monitor[5880]: fopen /run/nut/upsmon.pid: No such file or directory
Dec 03 11:41:23 Gen10 nut-monitor[5880]: Could not find PID file to see if previous upsmon instance is already running!
Dec 03 11:41:23 Gen10 nut-monitor[5880]: UPS: eaton@10.0.0.51 (primary) (power value 1)
Dec 03 11:41:23 Gen10 nut-monitor[5880]: Using power down flag file /etc/killpower
Dec 03 11:41:23 Gen10 nut-monitor[5882]: Init SSL without certificate database
Strange, because:
ullix@Gen10:~$ sudo systemctl restart nut-monitor.service
ullix@Gen10:~$ sudo upsmon -c reload
Network UPS Tools upsmon 2.8.0
ullix@Gen10:~$ sudo upsmon -c reload
Network UPS Tools upsmon 2.8.0
ullix@Gen10:~$ sudo upsmon -c reload
Network UPS Tools upsmon 2.8.0
ullix@Gen10:~$ sudo upsmon -c reload
Network UPS Tools upsmon 2.8.0
ullix@Gen10:~$ sudo upsmon -c reload
Network UPS Tools upsmon 2.8.0
;-))
Well, the first time it starts, there is no PID file. It says as much - "can't tell if I am alone or would conflict with a sibling" the best way computers can tell.
As for docs... https://networkupstools.org/docs/user-manual.chunked/ar01s06.html "6.3. Configuring automatic shutdowns for low battery events" does say that:
Warning
By design, since we require power-cycling the load and don’t want some systems to be powered off while others remain running if the "wall power" returns at the wrong moment as usual, the "FSD" flag can not be removed from the data server unless its daemon is restarted. If we do take the first step in critical mode, then we intend to go all the way — shut down all the servers gracefully, and power down the UPS.
Keep in mind that some UPS devices and corresponding drivers would latch the "FSD" again even if "wall power" is available, but the remaining battery charge is below a threshold configured as "safe" in the device (usually if you manually power on the UPS after a long power outage). This is by design of respective UPS vendors, since in such situation they can not guarantee that if a new power outage happens, their UPS would safely shut down your systems again. So it is deemed better and safer to stay dark until batteries become sufficiently charged.
I guess it might be expanded in https://networkupstools.org/docs/man/upsmon.html though (PRs welcome)
Too many docs touch on different aspects of same concepts, and there are tons of nuances to make people aware about... somewhere... so writers get lost in the maze too.
Now that I know I understand (partially). But this points to the problem: you are explaining the background to the people "in the know", and are not explaining to the (new) user!
Re this part man upsmon (as pic to keep format):
You see visually that the paragraph "Forced Shutdown" is separated by two new paragraphs from "SIMULATING POWER FAILURES" while these two really belong together. And after mentioning the "dummy SHUTDOWNCMD setting" - I cannot imagine that anyone building a new nut system would not use a dummy command - you fail to EXPLICITLY say what command to issue to finalize and get rid of the fsd flag.
That command, in my understanding now, is sudo systemctl restart net-server
(correct?).
Otherwise I would have to reboot my server, kind of defeating the purpose of a dummy command to avoid putting the server into a reboot.
Yes, probably. As far as NUT goes, one needs to "restart data server". Nuances on doing that differ between modern Linux with systemd, legacy linux before systemd, *BSD, MacOS, Solaris, Windows and wherever else NUT can run, and are really OS management detail, not NUT detail. You picked the OS you run, you learn to use it :)
I understand this is not the comfy answer one might like, and that many FOSS projects are torpedoed by their docs - but as long as there is nearly a one-man show budgeting a few hours a month, and at that - hours of someone well-versed in the ecosystem and inclined to solve the infinite backlog of technical challenges, it is the end-users who can notice at all if docs are insufficient, and give suggestions or better yet PRs to make them better.
Good point on shuffling paragraphs closer together so they make better sense with content available today. Probably someone sometime ago edited this to add a few lines more to a short chapter here and there, and the islands of related text drifted further and further apart.
On a side note, though probably part of the problem, the way developers see docs they write is this:
Styles and colors and block-panels are something that appears in a renderer for HTML, PDF, etc. and not something we look at every year. Too many too heavy tools are needed to generate that at all, and with development often happening via terminal to some Linux/Unix system (not on local desktop) too much work is needed to get those pretty renditions seen in a graphical system.
Amen.
But still, I need more help!
"Help me help you" - what was the question? What NUT technical problem remains un-solved? :)
(A daemon starting for the first time in current OS uptime and telling you there was no previous PID file is not a problem, it is a clue for troubleshooting when you try to start same daemon 10 times and fail because a sibling does run and block resources, but no PID file existed for whatever reason your forensic quest would discover)
As for browsing, it is mostly about setup of your web-server of choice. If you use Apache, this doc still can help: https://github.com/networkupstools/nut/blob/master/data/html/README
Particular tools are documented at
Sorry, that was not intended as a rough comment as you seem to have it taken. There was a cumulative effect of several misunderstandings, based on the problem that the docs are written more for those "in the know" than for newcomers like me. Now that the first problem got solved, I need to attach other computers into the shutdown sequence.
I am wondering what problems I may stumbling over. I'll try and see what comes up.
Thank you for your support.
So far I am failing in installing nut 2.8.0 on my "Linux Mint 21" computer. This may be one of the issue, when trying a test-shutdown:
and there is no shutdown. The content of /run/nut is this:
Something else is working, though, so there is hope:
Trying access with a webbrowser has completely failed. Apache is installed, and when using the url:
http://10.0.0.51:3493/
, it akes 1 ... 2 minutes(!) before I get back:Same outcome when I try e.g.
10.0.0.51:3493/upsstats.cgi
or variations of it.