grafana / agent

Vendor-neutral programmable observability pipelines.
https://grafana.com/docs/agent/
Apache License 2.0
1.59k stars 484 forks source link

Add a component that instead of using the Kubernetes API allows utilizing the local API #4233

Closed mattdurham closed 1 year ago

mattdurham commented 1 year ago

Background

Hitting the API constantly can put significant strains on the Kubeneretes API when the local kubelet instance already has much of the data needed.

Proposal

I propose we have a component that mirrors the kubernetes api that allows grabbing that data, there are some concerns over the stability of the API but we can likely limit or make rails on what it works with and doesnt.

rfratto commented 1 year ago

I'm not against this, but I'm not sure if it's possible, so it needs investigation.

Last I checked, Kubelets have an undocumented API, which implies (to me) that there's no guarantees that endpoints will work between releases, or even return the same data.

mattdurham commented 1 year ago

It is undocumented but looks stableish, so there are real concerns there. https://github.com/cyberark/kubeletctl seems to have wrappers around the api and likely can be reviewed to see how unstable it actually is.

rfratto commented 1 year ago

I propose we have a component that mirrors the kubernetes api that allows grabbing that data,

What data do we want to grab from Kubelets? I think the only thing they'll have available on them is a list of running pods.

mattdurham commented 1 year ago

Pods are the most interesting data and the best first case, though https://github.com/cyberark/kubeletctl/blob/master/API_TABLE.md lists other things that we might find value in.

rfratto commented 1 year ago

Thanks, that table is helpful.

I did some investigation and found the following:

Things I still don't know:

Minus the open questions above, it looks like this component is theoretically possible.

mikemykhaylov commented 1 year ago

datadog-agent implements similar functionality of querying the Kubelet API in pkg/util/kubernetes/kubelet/kubelet_client.go. Perhaps something can be learned from this use case

tpaschalis commented 1 year ago

Not directly related to this issue, but another potentially interesting way to avoid incurring extra load to the K8S API is based on clustering and the component scheduling idea outlined in #3151.

So a subset of the cluster nodes would schedule and run a 'distributor' component that is responsible a) for running the Prometheus service discovery by reaching out to the K8S API and b) partitioning the information per node. The rest of the peers could look up which node runs that 'distributor' component and retrieve the information about resources running on the same node as them.

gcampbell12 commented 1 year ago

Hey, this is something we'd find very useful! In our environments we would prefer a model where the grafana-agent daemonset scrapes pods running on the localhost, using the existing kubernetes discovery method with selectors on the daemonset would put significant load on the API server. To get the ball rolling on this/spark discussion I opened a draft implementation for a new discovery.kubelet feature largely inspired by the Pod discoverer in prometheus in https://github.com/grafana/agent/pull/4255

rfratto commented 1 year ago

Since #4255 is merged, I'm going to close this as completed :)