networkupstools / nut

The Network UPS Tools repository. UPS management protocol Informational RFC 9271 published by IETF at https://www.rfc-editor.org/info/rfc9271 Please star NUT on GitHub, this helps with sponsorships!
https://networkupstools.org/
Other
1.9k stars 342 forks source link

Add a '-n no-conf' option to upsdrvctl / upsd / upsmon to avoid failure when not yet configured #156

Open aquette opened 9 years ago

aquette commented 9 years ago

nut.conf was in charge of telling init sysV script if NUT was to be started, and which component(s). With newer init system, such as systemd, nut.conf is useless and all installed components are tried to be started. This results in failures, at least at initial installation. By adding a '-f force-no-conf' option, we avoid returrning EXIT_FAILURE if no device in configured in ups.conf nor in upsmon.conf. We may also consider broadening the scope to all configuration. i.e., if no configuration file is found, don't EXIT_FAILURE. However, note that other errors are not covered. @bigon , @clepple : any comment?

References:

clepple commented 9 years ago

Since you mention systemd, do we want to have another standard option for "don't background"? (This would also be used for launchd on OS X).

At the moment, we can use "-D" but that causes unnecessary logging that will only be discarded.

I mention this because "-f" makes sense for a "foreground" option. We could also use "-F". I'm not partial to either one.

However, "-f force-no-conf" seems odd: sort of a hybrid --long-option but without the opportunity to use the shortest possible prefix. Not suggesting that we move to getopt_long(), just that this could probably be done with a single flag character.

aquette commented 9 years ago

you got a point @clepple ;) I was more on the '-n no-conf' at first. That leaves you the room for '-f' which makes sense in lower-case. So what would exactly mean '-f' beside from not going background (i.e. what about fd and init?)

clepple commented 9 years ago

So what would exactly mean '-f' beside from not going background (i.e. what about fd and init?)

fd: do you mean closing stdin/stdout/stderr? Not sure what is considered best practices for systemd. launchd will let you log if needed, so it doesn't matter there.

Not sure what you mean by "init".=

aquette commented 9 years ago

@clepple: I meant attaching to init, pid 1, systemd, launchd, ...

clepple commented 9 years ago

If a process doesn't fork and background, then it is still attached to the parent that started it.

This might be veering off into another thread, but do we have a list of things that we need to change to make NUT behave better with systemd? If not, is there a checklist from the systemd folks? I admit that I have not looked at this much.

aquette commented 9 years ago

@cleppe : just to clarify, since you've probably updated the comment. this new option is really a single char. one, i.e. '-n'. The long keywords attached is just a disabiguation, but doesn't appear elsewhere for users.

aquette commented 9 years ago

question on upsd behavior when no device is configured ( @clepple esp.): do we want to exit(EXIT_SUCCESS) or to continue and server no device? commit c0c1262 implements the 2nd behavior, but the 1rst one may be more suitable...

clepple commented 9 years ago

Again, I don't have much direct experience with systemd, but applying what I know about launchd: if upsd stays running, it can listen for a reconfigure signal. If it exits, you have to specify additional configuration options to keep the init replacement from trying to continuously respawn the daemon.

aquette commented 9 years ago

It works almost fine: upsdrvctl and upsd are good. however, for upsmon, the PIDFile directive in nut-monitor.service make it fail.

With PIDFile:

systemctl status nut-monitor.service

nut-monitor.service - Network UPS Tools - power device monitor and shutdown controller Loaded: loaded (/lib/systemd/system/nut-monitor.service; enabled) Active: failed (Result: resources) since dim. 2014-09-28 19:07:18 CEST; 1s ago Process: 17095 ExecStart=/sbin/upsmon -n (code=exited, status=0/SUCCESS) Main PID: 677 (code=exited, status=0/SUCCESS)

sept. 28 19:07:18 arno-zbook15 upsmon[17095]: Using power down flag file /etc/killpower sept. 28 19:07:18 arno-zbook15 upsmon[17095]: Network UPS Tools upsmon 2.7.2-signed-104-gc0c1262 sept. 28 19:07:18 arno-zbook15 upsmon[17095]: Warning: no MONITOR line defined! sept. 28 19:07:18 arno-zbook15 upsmon[17095]: Warning: insufficient power configured! sept. 28 19:07:18 arno-zbook15 upsmon[17095]: Sum of power values........: 0 sept. 28 19:07:18 arno-zbook15 upsmon[17095]: Minimum value (MINSUPPLIES): 1 sept. 28 19:07:18 arno-zbook15 upsmon[17095]: Edit your upsmon.conf and change the values. sept. 28 19:07:18 arno-zbook15 systemd[1]: PID file /var/run/nut/upsmon.pid not readable (yet?) after start. sept. 28 19:07:18 arno-zbook15 systemd[1]: Failed to start Network UPS Tools - power device monitor and shutdown controller. sept. 28 19:07:18 arno-zbook15 systemd[1]: Unit nut-monitor.service entered failed state.

Without PIDFile (after calling 'systemctl start nut-monitor.service'):

systemctl status nut-monitor.service

nut-monitor.service - Network UPS Tools - power device monitor and shutdown controller Loaded: loaded (/lib/systemd/system/nut-monitor.service; enabled) Active: inactive (dead) since dim. 2014-09-28 19:08:13 CEST; 3s ago Process: 17112 ExecStart=/sbin/upsmon -n (code=exited, status=0/SUCCESS) Main PID: 677 (code=exited, status=0/SUCCESS)

sept. 28 19:08:13 arno-zbook15 upsmon[17112]: fopen /var/run/nut/upsmon.pid: No such file or directory sept. 28 19:08:13 arno-zbook15 upsmon[17112]: Using power down flag file /etc/killpower sept. 28 19:08:13 arno-zbook15 upsmon[17112]: Network UPS Tools upsmon 2.7.2-signed-104-gc0c1262 sept. 28 19:08:13 arno-zbook15 upsmon[17112]: Warning: no MONITOR line defined! sept. 28 19:08:13 arno-zbook15 upsmon[17112]: Warning: insufficient power configured! sept. 28 19:08:13 arno-zbook15 upsmon[17112]: Sum of power values........: 0 sept. 28 19:08:13 arno-zbook15 upsmon[17112]: Minimum value (MINSUPPLIES): 1 sept. 28 19:08:13 arno-zbook15 upsmon[17112]: Edit your upsmon.conf and change the values. sept. 28 19:08:13 arno-zbook15 systemd[1]: Started Network UPS Tools - power device monitor and shutdown controller.

@bigon : is PIDFile really useful? I know it tracks the PID of the upsmon instance that actually does the shutdown. But beyond from 'eye candy', is there any other impact?

aquette commented 9 years ago

@bigon: any comment? I also have to check the Condition* available, as pointed by Martin (Pitt)

bigon commented 9 years ago

Oh hey,

The preferred way to start daemon with systemd is not to fork them and keep them in foreground. In that case no PIDfile is needed.

bigon commented 9 years ago

It should also fix:

[   26.955431] systemd[1]: nut-monitor.service: Supervising process 2060 which is not our child. We'll most likely not notice when it exits.

Edit: bug #123 Edit Edit: Don't we have an issue when reloading the unprivileged process then as mentioned in #123

clepple commented 9 years ago

The preferred way to start daemon with systemd is not to fork them and keep them in foreground. In that case no PIDfile is needed.

upsmon still needs to fork in order to accomplish the privilege separation between the network component and the privileged component (parent) that invokes the shutdown. This is necessary even if the parent stays in the foreground using -D (as I mentioned in #123).

I suspect that systemd has a way for a foreground, unprivileged upsmon network listener to request a shutdown. Not sure how we would protect that from other users, though.

bigon commented 9 years ago

Hey,

Is there any fix for the initial bug discussed in this report (failure if there is no UPS configured)?

This needs to be fixed if we want to be part of next (jessie) debian release

clepple commented 9 years ago

Going back to the original bug 747863, it seems like we are trying to solve a problem in NUT which could be completely avoided by adjusting the .deb scripts.

Highlighting Martin Pitt's second recommendation:

  • In the postinst, only enable the unit if the service is configured, otherwise leave it as disabled; and add instructions how to enable it (with update-rc.d?) to nut.conf.

According to the systemd.unit documentation, I don't think any of the Condition* fields apply (except maybe ConditionDirectoryNotEmpty on the pipe directory for upsmon).

I am not sure we need to have NUT daemons hanging around without configuration - the case of no UPSes configured could easily be answered with a "connection refused", rather than connecting to upsd and seeing an empty UPS list.

Is there a SCM repository where the current Debian postinst script is kept? Otherwise, I can look at the .diffs for the jessie package, but I'm also interested in how they have changed over time.

bigon commented 9 years ago

@clepple all the files are stored at http://anonscm.debian.org/cgit/collab-maint/nut.git/tree/debian

clepple commented 9 years ago

@bigon I looked at the jessie release schedule, and I don't think I will have time to set up a development environment and debug the systemd issues before nut gets auto-removed from testing. (I am in the middle of migrating an unrelated server from wheezy to jessie, and that is taking longer than I anticipated.) There is also some merit in not introducing unnecessary differences between the Debian and Ubuntu packages, so can we fall back to the well-tested init.d scripts (per Martin's third point in bug 747863)?

The files in collab-maint seem to be expanded with debhelper, and from simple inspection, I am not sure what is requiring the service to be enabled by default at startup.

The systemd unit files are effectively forcing a policy decision on how to deal with an unconfigured service, and the init.d scripts already solve that with nut.conf. I think we need some more time to experiment with systemd to figure out the right way to handle this.

bigon commented 9 years ago

@clepple Yeah, I'll try to see if I've the time to fix this :/

There are some extra work needed to disable (remove the symlink) systemd services after removing them the .service file and need to take special care of stopping the service before removing the it.

bigon commented 9 years ago

Another option could be to have a wrapper script that exit 0 if the MODE=none

clepple commented 9 years ago

@bigon I don't have a strong opinion as to whether it is a NUT-specific wrapper that checks for MODE=none, or if the Ubuntu init.d compatibility scripts check for it. But I suspect that the latter would be less confusing. You have a better understanding about what will prevent NUT from being removed from Debian testing, I think.

jimklimov commented 2 years ago

Historic notes (as of Jul 2024, nearing NUT v2.8.3 release):

Partially same area is addressed for upsd by #766 allowing upsd to start when no devices are defined in upsd.conf and wait for a reload signal to read it (without interrupting the daemon uptime) when one is configured. The nut-driver-enumerator (NDE) takes care of automating that workflow in systemd/SMF worlds.

PR #683 (part of NUT v2.8.0 release) introduced the explicit -F/-B modes to NUT daemons, as well as "debug_min" in configs (and later a NUT_DEBUG_LEVEL envvar also - both without the forced foregrounding effect). so for service frameworks where we want to avoid forking, we are not obliged to run with verbose debug anymore. Further development capitalized on this for reference systemd unit and SMF manifest definitions.

Our reference systemd unit definitions and SMF manifest scripts (as well as older init scripts) do try to source the nut.conf file assuming it is a collection of verbatim key=value definitions, so settings conveyed there can still be taken into account (not sure they all are, but some certainly - tweaks for logger and such). The file is also consulted by nutshutdown hook/script, Windows nut.exe wrapper, etc.

Whether distro packaging uses files derived from NUT sources or some older custom code is a bit unbeknownst to us and frankly up to them - many recipes did have various solutions for OS integration before NUT got to suggest some.

Development like #1590 and #1777 allowed systemd notification integrations, so daemons can tell systemd when they are actually ready to serve (lengthy init completed), that they are intentionally stopping, etc. Maybe this can be used for this issue, to say "I am not configured, do not resuscitate, good night!"

This issue as posed is still not resolved however:

Currently upsmon has a minimum required configuration (now documented in sample upsmon.conf -- the MONITOR and MINSUPPLIES lines at least) and upsdrvctl needs the device/driver definition sections to process.

They all still require their config files to at least exist, or the daemon start-up fails otherwise; 3 files in case of upsd (even if only touch'ed and empty) e.g.:

:; ls -la /tmp/nnn0
total 24
drwxr-xr-x   2 jim  jim   4096 Jul 30 11:49 .
drwxrwxrwt 153 root root 20480 Jul 30 11:47 ..
-rw-r--r--   1 jim  jim      0 Jul 30 11:48 ups.conf
-rw-r--r--   1 jim  jim      0 Jul 30 11:47 upsd.conf
-rw-r--r--   1 jim  jim      0 Jul 30 11:49 upsd.users

:; ALLOW_NO_DEVICE=true NUT_STATEPATH=/tmp/nnn0 NUT_CONFPATH=/tmp/nnn0 ./server/upsd -DDDD
Network UPS Tools upsd 2.8.2.768-768-g4503f234f
   0.000000     fopen /tmp/nnn0/upsd.pid: No such file or directory
   0.000035     Could not find PID file '/tmp/nnn0/upsd.pid' to see if previous upsd instance is already running or not!
   0.000160     WARNING: /tmp/nnn0/upsd.conf is world readable (hope you don't have passwords there)
   0.000241     [D1] debug level is '4'
   0.000279     [D1] server_load: No LISTEN configuration provided, will try IPv6 localhost
   0.000288     [D3] listen_add: added ::1:3493
   0.000319     [D1] server_load: No LISTEN configuration provided, will try IPv4 localhost
   0.000367     [D3] listen_add: added 127.0.0.1:3493
   0.000403     [D3] setuptcp: try to bind to ::1 port 3493
   0.001602     listening on ::1 port 3493
   0.001688     [D3] setuptcp: try to bind to 127.0.0.1 port 3493
   0.001733     listening on 127.0.0.1 port 3493
   0.001826     [D1] server_load: tried to set up 2 listening sockets, succeeded with 2
   0.001867     [D3] server_load: ...of those related to localhost: overall: 2 tried, 2 succeeded; by name: 0T/0S; by name(6): 0T/0S; by IPv4 addr: 1T/1S; by IPv6 addr: 1T/1S
   0.001910     [D1] Can not become_user(nobody): not root initially, remaining UID=1000 GID=1000
   0.001951     [D1] chdired into statepath /tmp/nnn0 for driver sockets
   0.001966     WARNING: /tmp/nnn0 is world readable (hope you don't have passwords there)
   0.002029     Warning: no UPS definitions in ups.conf
   0.002074     Normally at least one UPS must be defined in ups.conf, currently there are none (please configure the file and reload the service)
   0.002142     /usr/local/ups/share/cmdvartab not found - disabling descriptions
   0.002184     WARNING: /tmp/nnn0/upsd.users is world readable (hope you don't have passwords there)
   0.002252     Running as foreground process, not saving a PID file
   0.002299     upsnotify: notify about state 2 with libsystemd: was requested, but not running as a service unit now, will not spam more about it
   0.002355     upsnotify: failed to notify about state 2: no notification tech defined, will not spam more about it
   0.002371     upsnotify: logged the systemd watchdog situation once, will not spam more about it
   0.002408     [D2] mainloop: polling 2 filedescriptors

   2.004512     [D2] mainloop: no data available
   2.004621     [D2] mainloop: polling 2 filedescriptors
^C   2.662213   mainloop: Interrupted system call
   2.662297     Signal 2: exiting
   2.662345     [D1] upsd_cleanup: starting the end-game
   2.662403     [D1] upsd_cleanup: finished