kyma-project / infrastructure-manager

Apache License 2.0
0 stars 9 forks source link

Dependency health checking (e.g. Gardener) #138

Open ebensom opened 5 months ago

ebensom commented 5 months ago

Description

Implement periodic health checking of Gardener cluster API dependency by periodically querying of the version or health non-resource endpoint via gardener kubeclient in a separate goroutine and keep the latest check result up-to-date. Expose the current (up-to-date) healthcheck result on the Prometheus metrics endpoint via series like:

{app}_{subsys}_gardener_health{url="..", status="healthy"} 1
{app}_{subsys}_gardener_health{url="..", status="error"} 0
{app}_{subsys}_gardener_health{url="..", status="unknown"} 0

Reasons

Ability to cross-correlate infrastructure-manager errors with Gardener API (dependency) errors.

Attachments

tobiscr commented 4 months ago

@ebensom : can we please quickly sync about it? We have open questions

tobiscr commented 2 months ago

@ebensom : I will setup a call for it to clarify the purpose. We want avoid to increase load on Gardener caused by redundant health-checks from us + additional monitoring etc.

tobiscr commented 2 months ago

Istio offers HTTP request metrics but those metrics are only available if traffic is used via plain HTTP but for Gardener HTTP connections are not possible as it enforces HTTPS communication.

Option to implement a check via Prometheus client in KCP would be possible, but this won't reflect whether the KIM is really able to talk to Gardener (it's not reflecting the truth).

tobiscr commented 2 months ago

@ebensom : we will only implement it if you send to each of us a "Thank you " award ;)