Open winmillwill opened 7 months ago
I'm going to explore this.
The reason we've used a small number of discovery.___
components and rely on the discovery.relabel
components to do the filtering is because of the Note attached to using the selectors field here:
https://grafana.com/docs/agent/latest/flow/reference/components/discovery.kubernetes/#selectors-block
In short, it's a balance between increased memory and CPU for the Agent pods and the load on the API server.
That being said, perhaps there's an argument to be made to make something like this tunable, though.
Can you tell me which agent pods are using too much memory? The Daemonset should be Grafana Agent Logs, which should already pre-filter to just the pods on the same node.
Hi, thanks for taking a look.
I'm using a daemonset to collect metrics because otherwise there's no sane
way (afaik) to partition the discovery so that every pod doesn't try to
fetch all the k8s metadata. To be clear, the problematic cluster has more
than 1k nodes with many pods and endpoints on each one, which would require
an absurd number of pods with a high memory request on each, all just to
duplicate the discovery data by roughly n^2 - n
for n nodes and still
have fairly brittle and brutal capacity issues.
I plan to still use a deployment to handle things that benefit from clustering and will have targets unevenly distributed across nodes: ksm and pod monitors, for example. That just requires discovering services and prom operator objects and then divvying up the resulting targets with clustering, which is far less than every single pod, endpoints, and node.
On Mon, Feb 12, 2024, 5:19 PM Pete Wall @.***> wrote:
Can you tell me which agent pods are using too much memory? The Daemonset should be Grafana Agent Logs, which should already pre-filter to just the pods on the same node.
— Reply to this email directly, view it on GitHub https://github.com/grafana/k8s-monitoring-helm/issues/372#issuecomment-1939769734, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGVH5MWYWC47JMB5NMLOITYTKPQ7AVCNFSM6AAAAABDE3IBLCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZZG43DSNZTGQ . You are receiving this because you authored the thread.Message ID: @.***>
So, you've likely noticed that this chart deploys 3 copies of the Grafana Agent:
<release>-grafana-agent
, a statefulset by default, scrapes metrics and acts as the receiver for traces
This can be scaled up as you need. It is clustered, which means metric scraping work will be distributed among the instances.
<release>-grafana-agent-logs
, a daemonset by default, gathers logs from pods running on the same node. It does this with volume mounts. Some clusters don't work well with volume mounts or with daemonsets. in those cases, you can turn it into a deployment, tell it to gather logs via the K8s API, and that will also enable clustering mode, so log gathering and processing will be distributed among the instances.
<release>-grafana-agent-events
, a deployment by default, listens for cluster events and sends them as logs to Loki. This also is clustered.
So, you can see that were possible, we always use clustering mode.
Rather than using a daemonset for the metrics agent, I would maybe look into enabling the HorizontalPodAutoscaler https://github.com/grafana/agent/blob/main/operations/helm/charts/grafana-agent/values.yaml#L197-L207
That being said, I'll still be investigating if we can optimize the discovery portion, I just didn't want you to be blocked waiting for a PR.
Actually, I'd love to learn more about your cluster and the memory usage of the agent. have you created a fork with the changes that you talked about? I'd like to see that if you have it handy.
Sorry for the internal slack link, but we did some experimentation:
https://raintank-corp.slack.com/archives/CSN5HV0CQ/p1707957609350309?thread_ts=1707949636.306849&cid=CSN5HV0CQ
Turns out having multiple discovery.kubernetes for each metric source actually is better for network and cpu.
When using a generic discovery.kubernetes "pods"
any time a pod is updated, all discovery.relabel components will have to recalculate. With specific ones, only pods that affect the associated relabel components will be updated.
We previously used a daemonset based metrics collection process. We actually were excited to be moving back to a clustered centralized metrics collection process as it allowed us to avoid the cost of another DaemonSet and the aggregate memory/API Server load that brings. In addition to many of our nodes are very expensive, so its much better to pay for cheap/commodity machines to run Grafana instead of GPU nodes.
In clusters of non-trivial size the memory requirements are far too high to have the agent naively pull the metadata for every single node, pod, and service. Especially when the controller is
daemonset
we would ideally have a very easy knob for "just the node I'm on and just the pods on that node please".If there's interest in supporting this I can try to maintain my own fork of this chart to meet my needs and contribute patches.
If this use case is too narrow to support in this helm chart, is there another place to track the Grafana Cloud K8s Integration requirements and whatever other best practices for deploying/configuring the grafana agent in k8s?