openSUSE / health-checker

Systemd service to check, if the system does come up correct after an update
GNU General Public License v2.0
25 stars 9 forks source link

etcd test complains it etcd isn't installed #3

Open fcrozat opened 5 years ago

fcrozat commented 5 years ago

When installing health-checker on a transactional server (Leap 15.1), I get errors in logs:

juin 11 14:38:51 kimsufi health-checker[1762]: Failed to get unit file state for etcd.service: No such file or directory

before checking if etcd.service is enabled, the check should ensure it is present (or at least not complain about it missing)

laenion commented 4 years ago

I'm sorry for the late reaction, I didn't have a watch set for this project.

The problem is that we don't have any health-checker plugins for the Transactional Server role yet. On Leap 15.1 the only provider of health-checker-plugins is health-checker-plugins-caasp, which will obviously assume that etcd is installed.

We have multiple options so solve this: 1) Add a generic health-checker-read-only plugins package, and possibly add more package logic to select the correct package for the distribution (e.g. by conflicting with the release package of other variants). 1) Get rid of all those distribution variant packages, split all the checks into separate packages and use Boolean Dependencies (https://rpm.org/user_doc/boolean_dependencies.html) to automatically install the check if both the health-checker package and the package the plugin is supposed to test is installed. This would allow to Mix'n'Mojo all the packages freely and is the most dynamic variant. 1) Change the plugins to check the preconditions first (e.g. by checking if the package is installed). (kubelet.sh and rebootmgr.sh also should be changed then.)

The assumption that the packages would always be installed is a leftover from the time when we only had SUSE CaaS Platform and openSUSE Kubic using this package. Even with openSUSE MicroOS this assumption is not true any more.

It still worked, however, as systemctl is-enabled <service> also returns 1 if the service doesn't exist at all, but that additional error message is printed.

laenion commented 4 years ago

Just a thought: Instead of skipping the test if the package is not installed we may also want to intentionally fail the test, otherwise we wouldn't notice if the package was accidentally uninstalled, but that would make the system less flexible again. Then again the presence of required packages could also be moved into another test.