gravitational / teleport

The easiest, and most secure way to access and protect all of your infrastructure.
https://goteleport.com
GNU Affero General Public License v3.0
17.42k stars 1.74k forks source link

Allow diagnostic port in teleport-kube-agent chart to be changed #43740

Open webvictim opened 3 months ago

webvictim commented 3 months ago

What would you like Teleport to do?

The teleport-kube-agent Helm chart currently hardcodes Teleport's --diag-addr to be 0.0.0.0:3000:

https://github.com/gravitational/teleport/blob/72bdd781d05d79f4fce55ee0ab49b3cae4a9379d/examples/chart/teleport-kube-agent/templates/deployment.yaml#L155-L156

This cannot be overridden by setting teleportConfig.teleport.diag_addr to a different host/port, as command-line arguments always take precedence over those defined in the config file.

A customer has a limitation on the range of ports that can be scraped by their external metric collector, so would like us to expose a way to change the Teleport process's diagnostic port to a different value.

What problem does this solve?

Allows monitoring of the health/readiness/metrics of the Teleport process providing connectivity to other services.

If a workaround exists, please include it.

Create a Kubernetes Service outside the chart which accepts external traffic on one of the whitelisted ports, and redirects it internally to port 3000. There are some issues here as this would have to either be a LoadBalancer or NodePort to be externally accessible, but this will be unable to reliably address multiple replicas.

hugoShaka commented 3 months ago

A customer has a limitation on the range of ports that can be scraped by their external metric collector

Agents are typically deployed with 2 replicas to avoid losing access to the cluster because a failed rollout or a broken node. The diagnostic listener only serves diagnostic for the current teleport instance, to get the agents metrics and validate their health you need to scrape every teleport instance.

The metric collector must discover and individually dial every Teleport agent instance. In most cases, this means the collector is inside the Kubernetes cluster, running as a pod and can dial any port the networking policy is not explicitly blocking.

From what I understand, allowing to configure the port would:

Unless we have more details as to how they are scraping from inside the cluster but still need to change the diagnostic port to accomodate a firewall, I would suggest not implementing this.