Send health-check requests on the cmd/health/check channels of the watched services.
Expect health status on the status/health of the the watched services.
Be extended to run on a child device to watch the daemons running on this device.
Note, that it makes no sense for tedge-watchdog to watch daemons running on a different device as the point is to restart them if not responding.
In practice:
tedge-watchdog has to read the tedge config to get the mqtt.topic_root and its mqtt.device_topic_id. Only the services running on mqtt.device_topic_id have to be watched.
As a first step tedge-watchdog can assume that the topic identifiers of the watched services are built using the default MQTT scheme ( device/main// -> device/main/service/tedge-agent). However, the goal is for tedge-watchdog to use an EntityStore to get the names of the services running on its device.
This refactoring must be synchronized with the refactoring of the thin-edge daemons: as long as a service (say tedge-agent) is expecting health check on the former topic (i.e. tedge/health-check/tedge-agent), tedge-watchdog has to use this topic.
tedge-watchdog
has to be updated to:cmd/health/check
channels of the watched services.status/health
of the the watched services.tedge-watchdog
to watch daemons running on a different device as the point is to restart them if not responding.In practice:
tedge-watchdog
has to read the tedge config to get themqtt.topic_root
and itsmqtt.device_topic_id
. Only the services running onmqtt.device_topic_id
have to be watched.tedge-watchdog
can assume that the topic identifiers of the watched services are built using the default MQTT scheme (device/main//
->device/main/service/tedge-agent
). However, the goal is fortedge-watchdog
to use anEntityStore
to get the names of the services running on its device.tedge-agent
) is expecting health check on the former topic (i.e.tedge/health-check/tedge-agent
),tedge-watchdog
has to use this topic.