gravitational / teleport

The easiest, and most secure way to access and protect all of your infrastructure.
https://goteleport.com
GNU Affero General Public License v3.0
17.61k stars 1.76k forks source link

Machine ID: Add healthz/readyz endpoints to `tbot` #19412

Open webvictim opened 1 year ago

webvictim commented 1 year ago

What would you like tbot to do?

Expose/healthz//readyz endpoints like Teleport does (https://goteleport.com/docs/reference/metrics/)

What problem does this solve?

Monitoring usability and performance of tbot

If a workaround exists, please include it.

Manually scrape logs and check process via system tools.

jBouyoud commented 1 year ago

As also with availability to log in json 🙏

strideynet commented 1 year ago

Going to rename this ticket to just refer to healthz and readyz since we have metrics now.

strideynet commented 1 year ago

As also with availability to log in json 🙏

Coming soon https://github.com/gravitational/teleport/pull/30755

strideynet commented 11 months ago

Ticket raised by @programmerq with additional details

Expected behavior:

When running tbot --diag-addr=0.0.0.0:3000, tbot should provide /healthz and /readyz endpoints for use in configuring liveness and readiness probes in Kubernetes deployments or StatefulSets.

The health endpoint(s) should reflect whether the bot has successfully been able to provide the credentials it is supposed to. That way kube can restart the pod to fix a situation.

Current behavior:

Currently, running tbot --diag-addr=0.0.0.0:3000 only sets up the /metrics and /pprof endpoints. It does not provide /healthz or /readyz endpoints, which are necessary for effectively managing the health and readiness of the containerized tbot process within a Kubernetes cluster.

Bug details:

See: https://github.com/gravitational/teleport/blob/v14.1.5/lib/tbot/tbot.go#L168-L188

related: https://github.com/gravitational/teleport/issues/42436 https://github.com/gravitational/teleport/issues/29048

strideynet commented 11 months ago

We'll implement this as two endpoints:

At this time, I'd rather avoid using the concept of "healthiness" since this doesn't seem to actually tie to a tangible state of tbot. This may change in future if we have the concept of outputs which run separately.