Closed sol1-matt closed 1 year ago
A larger question to this is what happens to the subscribed event streams on reload, restart and stop/start.
Instances where we need to, recheck everything are
possible solution to both would be to look at the icinga2 status for changes in program start.
/v1/status/IcingaApplication
returns a dict shown below. The results[0].status.icingaapplication.app.program_start
changes when icinga2 reloads or restarts. A meerkat scheduled check every 30 seconds would detect most problem and could trigger a re-initialization of event streams and make elements as stale needing api updates.
{
"results": [
{
"name": "IcingaApplication",
"perfdata": [],
"status": {
"icingaapplication": {
"app": {
"enable_event_handlers": true,
"enable_flapping": true,
"enable_host_checks": true,
"enable_notifications": true,
"enable_perfdata": true,
"enable_service_checks": true,
"environment": "",
"node_name": "example.com",
"pid": 11111,
"program_start": 1685555555.319866,
"version": "r2.13.7-20"
}
}
}
}
]
}
The big advantage here would be this is a active check from meerkat to icinga to detect when things need to be poked with a stick.
the <dashbaord>/update
functionality can be used for system restart/reload
The current implementation element update is based on events from icinga (individual services and hosts) to match the results of a elements previous API call, this match triggers the element to redo it's API call.
If the previous API call doesn't match any results then the events will never match the results as there are none.
This means that any elements which return zero results need to recheck on a reasonably quick schedule. eg 1 minuite.
Status based elements could also fail if they are part of a group of service (eg: not a individual service or host) and that group in icinga changes and the addition's state changes.
eg: Element 1 is Service A: OK Service B: OK
Icinga changes and adds Service C: Critical which would match Element 1 but meerkat doesn't know until Service A or B changes.
Another solution instead of a schedule would be to redo all these api calls after Icinga deploy's new configuration