systemd / systemd

The systemd System and Service Manager
https://systemd.io
GNU General Public License v2.0
13.19k stars 3.77k forks source link

Support configcheck before stop/restart of service #2175

Open KlavsKlavsen opened 8 years ago

KlavsKlavsen commented 8 years ago

Currently we have a lot of centos6, and other servers, using good old SystemV init scripts. Most services there (named, varnish, icinga, nagios, httpd) do a config check before reload/restart - to ensure that a faulty config does not kill a currently running service - but just "fails to update running config".

This is very nice.. especially when using configuration management, and having to deal with humans who make mistakes (like faulty configs :) - its much nicer to just have an error about new config not being put in production - than getting "service down" for all servers and having to scramble to STOP the rollout :(

To make system support that - I can do: ExecReload=/sbin/apachectl configtest && /usr/sbin/httpd $OPTIONS -k graceful

But problem is that reload != restart - especially not with apache. I've seen graceful hang "forever" with apache before.. and since the real world for systemadministrators, includes faulty software, it would be nice to also be able to do config check on the restart edition..

I was hoping an ExecRestart option could be added (I'll gladly do a PR.. I've never written any systemd code before.. its about time :) - or perhaps you have a better approach? I discussed it on #systemd on freenode and there were several approaches suggested.. one could perhaps do a ConfigCheck option.. so systemd could simply call ConfigCheck before calling ExecStop - and then decide "what to do, or not to do"..

It would definetely be nicer if a ConfigCheck option was there.. so the error would be much more precise (and from systemd - instead of some random configcheck hacked into ExecRestart/ExecReload..

What do you guys think?

KlavsKlavsen commented 8 years ago

dooh.. the ExecReload? with && does not work - as it's not a shell command being run.. so && is invalid.. my testing was faulty - since I found that httpd actually handles faulty config itself on graceful.. (not on restart though). hoping for a good resolution of this ticket - for a resolution, for all those services "out and about" which doesn't handle this nicely (which even apache doesn't.. for the restart case)

mischief commented 8 years ago

if you need the shell, you need to call it explicitly. you can't use a shell construct (&&) directly in a systemd unit.

KlavsKlavsen commented 8 years ago

@mischief yes - I gathered that. could do bash -c .. but that is not a very pretty solution - and I still hope you agree that Systemd should have a ConfigCheck option, so daemon "authors" can provide a way for systemd to verify configuration - before trying to restart a service (since most would prefer running on an older configuration, than the service stopping completely, because the new config is invalid :) (especially with automatic rollouts - like done with puppet and other configuration management systems.. its very nasty :)

amarao commented 8 years ago

I propose to add "ExecReloadPre=" to available options for systemd.service, like "ExecStart", etc.

Hypothetical quote from the manual:

       Additional commands that are executed before the command
       in ExecReload=. Syntax is the same as for ExecStart=,
       except that multiple command lines are allowed and the commands are
       executed one after the other, serially.

       If any of those commands (not prefixed with "-") fail, the rest are
       not executed and the unit is not reloaded.
Civil commented 8 years ago

It's not only about reload, but also about restart. For reload you can do ugly code, but for restart you can't.

KlavsKlavsen commented 8 years ago

I'm fine with whatever name - but it needs to be run for both reload and restart. @Civil why can you do "ugly code" for reload, but not for restart?

Civil commented 8 years ago

@KlavsKlavsen well, maybe I'm just doing something wrong or because there is just no way to perform custom checks before stopping. I've tried following stuff, but still no luck in getting it running:

  1. ExecStop executes ok, but no matter what exit code is, KillSignal will be sent to the application after some time.
  2. There is no way to specify "KillSingal=none".
  3. With KillMode=none systemd will try to do strange thing - it'll try to execute ExecStop and no matter what it'll just say "Ok, it's stopped". Though process will be still running, and if you'll try to start it again (or if you've tried to restart it) systemd will try to spawn the process again. In my case starting it back won't fail, but I'll end up with two copies of application and one of them won't work correctly, but from systemd's point of view everything will be restarted.
johannbg commented 8 years ago

I dont think there is something to do here for systemd which is not already covered via existing implementation ( Exec* lines ) for those daemon and services that support sanity check their own configuration file before startup ( which are rather few I might add, bind httpd etc less than 1% out of what I went through in Fedora ) in as good as it can be manner it can be done.

Civil commented 8 years ago

@johannbg this ticket is not about pre-checking before reload but about pre-checking before restart.

johannbg commented 8 years ago

@Civil irrelevant since in either case this cannot be handled better than it is done already with Exec* lines

Fact is some daemon/services stop and start and some daemon/service support reload ( re-reading configuration ), very few daemons/service support syntax checking specifically ( before reload/restart ) and those that do dont always fail ( exit correctly ) when they encounter error or all errors. In many cases they just spew something in the log and try to continue anyway ( which makes what you are trying to achieve here fail ).

I went through this discussion with those few upstream daemon/service components that supported this like me and Joe Orton upstream/Fedora/RHEL maintainer of httpd discussed this when we migrated httpd legacy sysv initscript to native systemd in F15/F16 and the end result was not to try to implement this in type systemd unit for the httpd which for example triggers a warning when it cant determine the server's fully qualified domain name while the configuration syntax remains OK. Would you not want to reload the apache server if it cant do a reverse lookup on itself? etc. At the same time what needed/could be implemented in systemd and was discussed in one of the hackfests and the missing bits implemented shortly there after.

KlavsKlavsen commented 8 years ago

@johannbg some of the more noteworthy on servers atleast, is named, varnish, icinga, nagios, httpd - which all have a check (external to the daemon) which one can run, to test configuration is valid. on Centos/RedHat 6 - it is greatly appreciated and has saved our services many times, when someone commits an invalid config into our systemconfiguration - resulting in a misconfigured service on X servers.

AFAIK I can only hack it in on ExecReload (by using bash -c as command) - but that does not cover the 'restart' command.

Surely the fact that many daemons appearently does NOT have a config checker you can trust, should not be a reason to disable the possibility for those who do have a proper config checker?

johannbg commented 8 years ago

I'm very much aware of that since I went through this with most of the upstream that had external checks ( like I have already mentioned ) when I was migrating all the legacy sysv initscript in Fedora and systemd already supports this via ExecReload= and ExecStop= entries however those daemon/services need to tell ( signal ) systemd to abort it's reload/shutdown when applicable on relevant errors and none of those currently do that and upstream ( at that time ) did not care enough or want to add that to their component

Civil commented 8 years ago

@johannbg can be if you'll give me a way to abort "stop" on result of some command, cause I'm unable to find any way to do that in case I can't modify a software for any reason.

Fact is that I have a daemon that supports reload, this daemon also supports syntax checking. For do config-check on reload I've redefined ExecReload and it works, but I need to prevent restart with broken config, and systemd 219 kills it with SIGKILL even if ExecStop returns non-zero exit code.

The problem is that sometimes you just have some software which you can't modify or modifying of which is hard (for various reasons). Well, actually doing a python script (or small C program) that'll notify systemd to abort shutdown is also a way, but I'd prefer not to do this.

KlavsKlavsen commented 8 years ago

@johannbg So we agree you can't actually prevent a restart with systemd today in any nice way? and if systemd actually supported a 'configcheck' option - I'd bet upstreams would start receiving PR's for their projects to support this in systemd. The daemons I mentioned actually HAS a working do not restart/reload if broken check in their init scripts on centos/redhat 6, and I personally talked with the Varnish team who was happy I tried to take this up with you guys, since they'd like their el7 rpms to actually support this as well..

tierpod commented 8 years ago

We have this problem.

KlavsKlavsen commented 8 years ago

still hoping systemd can get this feature - so it can deliver the equivalent of what CentOS6 does (by virtue of init.d scripts).. its the reason I haven't moved many of my servers to centos 7.. one needs some "stopgaps measures" to avoid automated rollout to screw your entire environment, if someone does something stupid.

sponkydive commented 8 years ago

+1 really miss option to prevent stopping a service with restart, if it's clear that it can not come up anymore due to missconfiguration

johannbg commented 8 years ago

@sponkydive This is not a systemd feature so there is no point in "+" this but an relevant upstream/daemon one where they parse their own configuration files before shutting down and or starting the daemon or do what ever other health check they deem needed before shutdown/startup/restart of their daemon and or service.

Civil commented 8 years ago

@johannbg it's a missing feature in systemd and have nothing to do with upstream daemons. Because if daemon will refuse to be stopped by systemd it'll be killed with -9 and there is no way to disable that properly (if you'll disable killing, then daemon will be orphaned - still running but systemd will think that it's stopped).

johannbg commented 8 years ago

@Civil really name me the daemon/services which notify systemd in some manner that the prestop/preflight check failed so they should not be (re)started or stopped?

Civil commented 8 years ago

See above, I've already said that I've tried to implement it on my own, but there is just no way to do that with current version of systemd.

johannbg commented 8 years ago

@Civil where is the code in the daemon that you implemented for that?

Civil commented 8 years ago

@johannbg in my private repos in a branch with this support.

johannbg commented 8 years ago

@Civil just paste that code sample or make it public.

johannbg commented 8 years ago

@Civil or better yet make a pull request with those changes to systemd you made so that it can be reviewed

Civil commented 8 years ago

@johannbg I'll make a PR when I'll have some free time to code it and test it (if it won't be fixed before).

As about code - actually the same - when I'll have time to make a test case I'll publish it. So that won't happen soon.

piotr1212 commented 7 years ago

@KlavsKlavsen If you are using configmgmt, let the tool check the config before overwriting the file. For example this can be done in Ansible with the "validate" option of the template module: http://docs.ansible.com/ansible/template_module.html .

Besides that - I would appreciate a ConfigCheck option in systemd.

KlavsKlavsen commented 7 years ago

@piotr1212 Thats not a bad idea - I'm using puppet and I can do the same there. That does not help those just using f.ex. varnish'es rpm package though.

keszybz commented 7 years ago

I have to say that I rather like the proposed approach in https://github.com/systemd/systemd/issues/2175#issuecomment-184308734, with ExecRestartPre= and ExecReloadPre=. This should cover the requested use case nicely, while being usable for other things too.

Civil commented 7 years ago

Most important thing that it should not actually stop or kill service if this "Exec*Pre" fails.

cjwatson commented 7 years ago

ExecReloadPre seems unnecessary, since you can just provide multiple ExecReload directives instead; the documentation says that if one of those fails then subsequent commands are not run, and my testing agrees with this. (I agree that this does not help with the restart case.)

Civil commented 7 years ago

It is necessary, because it will simplify reload clause. Especially I. Case when reload with broken config can cause troubles

cjwatson commented 7 years ago

I don't see how, for example:

ExecReloadPre=/usr/sbin/sshd -t
ExecReload=/bin/kill -HUP $MAINPID

... is any simpler than:

ExecReload=/usr/sbin/sshd -t
ExecReload=/bin/kill -HUP $MAINPID

(Yes, I know I shouldn't be using kill -HUP here. Let's ignore that for now ...)

runejuhl commented 7 years ago

I can see a problem with using multiple ExecReload lines; how do you override it? As I understand it there's no way to keep the existing ExecReload command (so you don't have to redefine it) while adding another ExecReload (for the purpose of e.g. config checks) in an override file. If it was possible you'd have to provide some extra information about how to order similarly named ExecReload clauses...

You could use

ExecReload=
ExecReload=/usr/sbin/sshd -t
ExecReload=/bin/kill -HUP $MAINPID

But that still means you end up including the original ExecReload in a new file instead of leaving it to the system-installed service unit which is suboptimal and error-prone.

torgiren-alior commented 7 years ago

Workaround:

cat haproxy.service
[Service]
ExecReload=/usr/bin/touch /tmp/zzxx.txt

.include /usr/lib/systemd/system/haproxy.service

[Service]
ExecReload=/usr/bin/touch /tmp/ppp.txt
systemctl show haproxy|grep Reload
ExecReload={ path=/usr/bin/touch ...
ExecReload={ path=/bin/kill ; argv[]=/bin/kill -USR2 $MAINPID ;...
ExecReload={ path=/usr/bin/touch ...
Civil commented 7 years ago

That's a workaround for Reload, but not for Restart.

sts commented 7 years ago

@johannbg I don't understand your argument --

Even though there are users requiring this feature and it has been a common pattern to safeguard service outages with sysvinit scripts in the past, you are saying that there are no daemons supporting it and as long as there are none it won't get implemented?

At the same time there doesn't exist any guidelines for upstreams on how systemd expects services to behave for this use-case and won't be able to add support. Just because there is no support yet, doesn't mean it won't get implemented. Also I don't see what should be implemented beyond a return code >0 from a command running the syntax check.

I also hope that ExecStartPre wasn't added for the purpose of config checking, as at this stage its probably the most useless.

Also I don't understand why a WARNING triggered by a daemon during a config check (the httpd example you brought up), would render the config check as invalid, as it still exits with a zero status code and restarts succeeds even with this configuration.

# apache2ctl -t
sudo: unable to resolve host test01
AH00557: apache2: apr_sockaddr_info_get() failed for test01
AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 127.0.0.1. Set the 'ServerName' directive globally to suppress this message
Syntax OK
# echo $?
0
# systemctl restart apache2
[ ok ] Restarting apache2 (via systemctl): apache2.service.

Also there are multiple hints that this has been considered as a feature in the past, so please lets think about it again.

From the todo file:

https://github.com/systemd/systemd/blob/f30574144d8a59d32c8bbddf5a54425e68d20a3c/TODO#L816

https://github.com/systemd/systemd/blob/f30574144d8a59d32c8bbddf5a54425e68d20a3c/TODO#L423

The commit which removed SERVICE_EXEC_RELOAD_PRE and SERVICE_EXEC_STOP_PRE: https://github.com/systemd/systemd/commit/034c6ed7da5e44bfdde5a5d0da75f7b7a59953b8#diff-699a8e442df3a587e1780348cc32f793

thresheek commented 6 years ago

Hey. nginx.org packages maintainer here.

I've just ran into the same issue with nginx here - we've been using the following snippet in the restart portion of the initscript for the sysv-based distros:

restart)
    configtest -q || exit $RETVAL
    stop
    start
    ;;

where configtest is just basically nginx -t to check the configuration.

Unfortunately we can't achieve the same behaviour and be nice to our users in the systemd-based distros. Hopefully this [+1] will have some effect on moving the TODO entry up a little bit.

tmccombs commented 6 years ago

systemd already supports this via ExecReload= and ExecStop= entries however those daemon/services need to tell ( signal ) systemd to abort it's reload/shutdown when applicable on relevant errors and none of those currently do that and upstream

@johannbg How would one signal systemd to abort it's reload/shutdown without leaving an orphan process?

I'm trying to implement config checking on restart for a daemon I work on, and haven't been able to find a way to do that in the documentation.

Also, in my case (and I imagine others as well), I have the following requirements:

  1. Bad config should prevent a restart, but shouldn't prevent a stop that is not part of a restart
  2. systemctl restart my-service should return a non-zero exit status if the config check fails so that a script running it knows if the restart was unsuccessful
tmccombs commented 6 years ago

Any of the following would work:

maltris commented 5 years ago

ExecReloadPre= might be useful where it shrinks this override/dropin from:

[Service]
ExecReload=
ExecReload=/bin/bash -c '/usr/bin/socat /var/run/haproxy.sock - <<< "show servers state" > /etc/haproxy/state/global'
ExecReload=/usr/sbin/haproxy -f $CONFIG -c -q $EXTRAOPTS
ExecReload=/bin/kill -USR2 $MAINPID

into this:

[Service]
ExecReloadPre=/bin/bash -c '/usr/bin/socat /var/run/haproxy.sock - <<< "show servers state" > /etc/haproxy/state/global'

while the original unit looks like this:

# /lib/systemd/system/haproxy.service
[Unit]
Description=HAProxy Load Balancer
Documentation=man:haproxy(1)
Documentation=file:/usr/share/doc/haproxy/configuration.txt.gz
After=network.target syslog.service
Wants=syslog.service

[Service]
Environment="CONFIG=/etc/haproxy/haproxy.cfg" "PIDFILE=/run/haproxy.pid"
EnvironmentFile=-/etc/default/haproxy
ExecStartPre=/usr/sbin/haproxy -f $CONFIG -c -q $EXTRAOPTS
ExecStart=/usr/sbin/haproxy-systemd-wrapper -f $CONFIG -p $PIDFILE $EXTRAOPTS
ExecReload=/usr/sbin/haproxy -f $CONFIG -c -q $EXTRAOPTS
ExecReload=/bin/kill -USR2 $MAINPID
KillMode=mixed
Restart=always

[Install]
WantedBy=multi-user.target
taladar commented 4 years ago

I am also trying to implement something so our daemons do not die when we have syntax errors in the config. I like the idea of having an explicit ConfigTest option that is called by systemd before reloads and restarts (and possibly also start) for more readable error messages.

I really don't understand @johannbg 's arguments in this thread that claim no daemons signal systemd about their config test results when, as far as I can tell, there is not even a way for them to do so provided by systemd. Does sd-notify contain something for this purpose? Even if it does, admins can not use this to achieve the same thing without changing the daemon source code and recompiling.

I also do think that added ExecReloadPre and ExecRestartPre would be useful as well, for the same reason people have mentioned above (especially the override drop-ins without duplication of upstream lines).

I don't quite see why this ticket is approaching 5 years by now even though it seems a relatively simple and well-explained request that lots of other people have agreed with.

gregorg commented 4 years ago

Almost 5 years we are claiming this feature ...

tmccombs commented 4 years ago

And two years since I asked @johannbg to elaborate on how daemons can accomplish this with existing systemd, with no answer.

runejuhl commented 4 years ago

Almost 5 years we are claiming this feature ...

And two years since I asked @johannbg to elaborate on how daemons can accomplish this with existing systemd, with no answer.

...and neither of you have contributed a design document or a taken a stab at implementing the feature. I'd really like to see this feature as well, but please refrain from posting "me too" or complaining that it has not been fixed. We all know that, that's why we're here!

By all means, please do work on it in any way you can. From what I've seen of the systemd code base you don't have to be a C guru (and I ain't one either) to understand enough to be able to contribute. Or you could work on a proper specification for the feature and a draft of documentation. I'd be more than happy to review PRs and help out that way.

tmccombs commented 4 years ago

Any of the following would work:

* The proposed CheckConfig option

* An ExecStopPre directive that can cause a stop or restart to be aborted

* A non-zero exit status for an ExecStop command aborts the stop (not backwards compatible)

* A way for a process started by ExecStop to signal to systemd that the stop should be aborted (ex something like `systemd-notify AbortStop=1`

So, which of these options is most likely to be accepted? I'll make a PR if one of these options would be acceptable.

KlavsKlavsen commented 4 years ago

I also proposed to do a PR - 5 years ago.. But doing a PR, before you atleast agree on a good way do it, with those who have commit rights to master - is easily a waste of time :(

OlafvdSpek commented 4 years ago

Ideally there'd also be an option, for services that can do it, to start the new instance before the old one is stopped. The new instance would read and check config, finish all initialization and then take over from the old instance.

KlavsKlavsen commented 4 years ago

@OlafvdSpek Wouldn't it be enough to mark such a service for "NeverRestartOnlyReload" ? since they would probably do it that way every time anyways - and reload/restart would essentially be the same (so systemd should not send a stop at all)

OlafvdSpek commented 4 years ago

NeverRestartOnlyReload doesn't exist, does it?

New code / binary requires a new instance / process to be started, but AFAIK systemd lacks support for that. Reloading is usually just sending a signal to the existing instance.