Open webvictim opened 1 year ago
As also with availability to log in json
🙏
Going to rename this ticket to just refer to healthz
and readyz
since we have metrics now.
As also with availability to log in
json
🙏
Coming soon https://github.com/gravitational/teleport/pull/30755
When running tbot --diag-addr=0.0.0.0:3000
, tbot should provide /healthz
and /readyz
endpoints for use in configuring liveness and readiness probes in Kubernetes deployments or StatefulSets.
The health endpoint(s) should reflect whether the bot has successfully been able to provide the credentials it is supposed to. That way kube can restart the pod to fix a situation.
Currently, running tbot --diag-addr=0.0.0.0:3000
only sets up the /metrics
and /pprof
endpoints. It does not provide /healthz
or /readyz
endpoints, which are necessary for effectively managing the health and readiness of the containerized tbot process within a Kubernetes cluster.
tbot
with the diagnostic address flag: --diag-addr=0.0.0.0:3000
./healthz
and /readyz
are not present.See: https://github.com/gravitational/teleport/blob/v14.1.5/lib/tbot/tbot.go#L168-L188
related: https://github.com/gravitational/teleport/issues/42436 https://github.com/gravitational/teleport/issues/29048
We'll implement this as two endpoints:
tbot
startstbot
has successfully joined the cluster and can start outputting outputs and offering services. This should return an error when tbot
is shutting down.At this time, I'd rather avoid using the concept of "healthiness" since this doesn't seem to actually tie to a tangible state of tbot
. This may change in future if we have the concept of outputs which run separately.
What would you like
tbot
to do?Expose
/healthz
//readyz
endpoints like Teleport does (https://goteleport.com/docs/reference/metrics/)What problem does this solve?
Monitoring usability and performance of
tbot
If a workaround exists, please include it.
Manually scrape logs and check process via system tools.