Closed imre-sidn closed 6 months ago
It looks like the Naemon service is restarted before the naemon-livestatus has been updated, resulting in an API compatibility during startup. I think this can be solved by delaying the Naemon service restart until after the naemon-livestatus package update has completed.
That's sounds quite likely. Any idea if this is possible with RPMs?
I don't know how the current packages are build, but the only package that should restart Naemon is naemon-core
. At the moment there are probably several packages that are doing restarts of Naemon?
With deb
this is pretty straightforward using a activate trigger
. So all other packages such as naemon-livestatus
can trigger naemon-core
to do the restart. If multiple packages trigger, only one restart will be executed.
With rpm
nothing is as easy as it should be. My solution to this is to use post-transaction-actions
.
https://github.com/rpm-software-management/dnf-plugins-core/blob/master/doc/post-transaction-actions.rst
the livestatus packages does a condrestart
in its %post
script:
https://github.com/naemon/naemon-livestatus/blob/master/naemon-livestatus.spec#L53
But that's not the issue here, because at that point naemon is stopped already.
The core package does a condrestart
in the %post
script:
https://github.com/naemon/naemon-core/blob/master/naemon-core.spec#L196
But it seems like a good idea to move them to a later stage, ex the %posttrans
I think condrestart
is not helping in in case as naemon is running, but the new Naemon binary can not restart due to the old broker version is still in place 🤔
Hi, thanks for your input on this. I'm looking at building a container image for easy reproducibility of this bug. Do you guys know where I can find the previous (1.4.1) RPMs? I can also build the RPMs myself if an archive is not available.
I've created a container image (Containerfile) based on Rocky Linux 8 with libnaemon, naemon-core and naemon-livestatus 1.4.1 installed, and 1.4.2 RPMs on the filesystem. Steps to reproduce this issue with Docker or Podman:
podman run --rm --detach --name naemon-livestatus ghcr.io/imre-sidn/naemon-livestatus:1.4.1
podman exec --interactive --tty naemon-livestatus bash
dnf install -y ./libnaemon-1.4.2.rhel8.x86_64.rpm ./naemon-core-1.4.2.rhel8.x86_64.rpm ./naemon-livestatus-1.4.2.rhel8.x86_64.rpm
tail -n 20 /var/log/naemon/naemon.log
should be fine then with the next release. Thanks for fixing this.
We have a dnf-automatic timer that periodically updates our RPM packages. Last night a number of packages, including Naemon, got updated. Unfortunately the naemon.service systemd unit got restarted as part of this update, and did not come up correctly.
Relevant log excerpts:
/var/log/dnf/dnf.log
/var/log/naemon/naemon.log
Starting the failed service again later today resolved the issue. It looks like the Naemon service is restarted before the naemon-livestatus has been updated, resulting in an API compatibility during startup. I think this can be solved by delaying the Naemon service restart until after the naemon-livestatus package update has completed.