grafana / k8s-monitoring-helm

Apache License 2.0
178 stars 72 forks source link

allow setting discovery.kubernetes selectors block #372

Open winmillwill opened 7 months ago

winmillwill commented 7 months ago

In clusters of non-trivial size the memory requirements are far too high to have the agent naively pull the metadata for every single node, pod, and service. Especially when the controller is daemonset we would ideally have a very easy knob for "just the node I'm on and just the pods on that node please".

If there's interest in supporting this I can try to maintain my own fork of this chart to meet my needs and contribute patches.

If this use case is too narrow to support in this helm chart, is there another place to track the Grafana Cloud K8s Integration requirements and whatever other best practices for deploying/configuring the grafana agent in k8s?

petewall commented 7 months ago

I'm going to explore this. The reason we've used a small number of discovery.___ components and rely on the discovery.relabel components to do the filtering is because of the Note attached to using the selectors field here: https://grafana.com/docs/agent/latest/flow/reference/components/discovery.kubernetes/#selectors-block

In short, it's a balance between increased memory and CPU for the Agent pods and the load on the API server.

That being said, perhaps there's an argument to be made to make something like this tunable, though.

petewall commented 7 months ago

Can you tell me which agent pods are using too much memory? The Daemonset should be Grafana Agent Logs, which should already pre-filter to just the pods on the same node.

winmillwill commented 7 months ago

Hi, thanks for taking a look.

I'm using a daemonset to collect metrics because otherwise there's no sane way (afaik) to partition the discovery so that every pod doesn't try to fetch all the k8s metadata. To be clear, the problematic cluster has more than 1k nodes with many pods and endpoints on each one, which would require an absurd number of pods with a high memory request on each, all just to duplicate the discovery data by roughly n^2 - n for n nodes and still have fairly brittle and brutal capacity issues.

I plan to still use a deployment to handle things that benefit from clustering and will have targets unevenly distributed across nodes: ksm and pod monitors, for example. That just requires discovering services and prom operator objects and then divvying up the resulting targets with clustering, which is far less than every single pod, endpoints, and node.

On Mon, Feb 12, 2024, 5:19 PM Pete Wall @.***> wrote:

Can you tell me which agent pods are using too much memory? The Daemonset should be Grafana Agent Logs, which should already pre-filter to just the pods on the same node.

— Reply to this email directly, view it on GitHub https://github.com/grafana/k8s-monitoring-helm/issues/372#issuecomment-1939769734, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGVH5MWYWC47JMB5NMLOITYTKPQ7AVCNFSM6AAAAABDE3IBLCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZZG43DSNZTGQ . You are receiving this because you authored the thread.Message ID: @.***>

petewall commented 7 months ago

So, you've likely noticed that this chart deploys 3 copies of the Grafana Agent:

  1. <release>-grafana-agent, a statefulset by default, scrapes metrics and acts as the receiver for traces This can be scaled up as you need. It is clustered, which means metric scraping work will be distributed among the instances.

  2. <release>-grafana-agent-logs, a daemonset by default, gathers logs from pods running on the same node. It does this with volume mounts. Some clusters don't work well with volume mounts or with daemonsets. in those cases, you can turn it into a deployment, tell it to gather logs via the K8s API, and that will also enable clustering mode, so log gathering and processing will be distributed among the instances.

  3. <release>-grafana-agent-events, a deployment by default, listens for cluster events and sends them as logs to Loki. This also is clustered.

So, you can see that were possible, we always use clustering mode.

petewall commented 7 months ago

Rather than using a daemonset for the metrics agent, I would maybe look into enabling the HorizontalPodAutoscaler https://github.com/grafana/agent/blob/main/operations/helm/charts/grafana-agent/values.yaml#L197-L207

petewall commented 7 months ago

That being said, I'll still be investigating if we can optimize the discovery portion, I just didn't want you to be blocked waiting for a PR.

petewall commented 7 months ago

Actually, I'd love to learn more about your cluster and the memory usage of the agent. have you created a fork with the changes that you talked about? I'd like to see that if you have it handy.

petewall commented 5 months ago

Sorry for the internal slack link, but we did some experimentation: https://raintank-corp.slack.com/archives/CSN5HV0CQ/p1707957609350309?thread_ts=1707949636.306849&cid=CSN5HV0CQ Turns out having multiple discovery.kubernetes for each metric source actually is better for network and cpu. When using a generic discovery.kubernetes "pods" any time a pod is updated, all discovery.relabel components will have to recalculate. With specific ones, only pods that affect the associated relabel components will be updated.

sidewinder12s commented 2 days ago

We previously used a daemonset based metrics collection process. We actually were excited to be moving back to a clustered centralized metrics collection process as it allowed us to avoid the cost of another DaemonSet and the aggregate memory/API Server load that brings. In addition to many of our nodes are very expensive, so its much better to pay for cheap/commodity machines to run Grafana instead of GPU nodes.