The agent currently checks the ActiveState of its managed services every 10 seconds and reacts if a service reports anything else than active at that time.
This method has the potential of missing failures, if the service for example failed and restarted within that 10 second window and is back "in order" by the time the agent checks again.
I believe systemd exposes signals about failed jobs on dbus, which the agent should subscribe to in order to keep track of unit health.
The agent currently checks the ActiveState of its managed services every 10 seconds and reacts if a service reports anything else than active at that time.
This method has the potential of missing failures, if the service for example failed and restarted within that 10 second window and is back "in order" by the time the agent checks again.
I believe systemd exposes signals about failed jobs on dbus, which the agent should subscribe to in order to keep track of unit health.