Limit impact on k8s apiserver in large clusters

dashpole commented 6 months ago

What I would like to be able to do

I mentioned this briefly at the community meeting earlier today.

As a general best practice, DaemonSets should avoid watching a resource cluster-wide, such as watching all pods, all replicasets, all services, etc. Doing this can limit the maximum possible number of nodes in a cluster. It is acceptable to watch pods assigned to the same node as the DaemonSet pod. That actually generates less load on the kube-apiserver than a deployment with multiple replicas watching all pods, since the traffic is roughly O(pods * replicas) for the deployment. Ideally, I would like to be able to run Beyla with the following architecture:

The Beyla DaemonSet watches pods assigned to the same node as itself, and telemetry includes pod information.
A horizontally scaled deployment (e.g. the OpenTelemetry Collector) enriches telemetry with information about other k8s resources.

To do that, it would be nice to have more control over which k8s resources beyla watches. This would typically be done using field selectors, similar to the prometheus server's selectors config in kubernetes_sd_configs.

Alternatives considered

The above will work well for single-application metrics, like HTTP golden signal metrics for a pod, since all relevant pod information is about pods running on the node. However, if I want to make a service graph, the above approach won't work, as I would also need pod information about pods running on other nodes, which defeats the purpose of the improvement. I had considered doing all IP -> Pod mapping in a deployment to enable that use-case.

The issue I ran into is filtering. At least on GKE, there is a bunch of traffic to things I don't really care about (e.g. kubelet health checks). I would like to be able to filter out things that aren't a pod, and only collect telemetry for pods, but I couldn't figure out how to do that (and couldn't think of a good way to implement it, either).

dimunech commented 5 months ago

To expand on this - Kubernetes metadata decorator adds considerable load to Kubernetes API servers. Here's a graph of master nodes memory usage before and after disabling the decorator (yellow annotation on the graph).

marctc commented 2 months ago

I believe this issue was fixed on #997. Please reopen this issues if that's not the case, thanks!

dashpole commented 2 months ago

Awesome, thank you!

grafana / beyla

Limit impact on k8s apiserver in large clusters #824