grafana / alloy

OpenTelemetry Collector distribution with programmable pipelines
https://grafana.com/oss/alloy
Apache License 2.0
1.2k stars 142 forks source link

Add `otelcol.processor.groupbyattrs` component #224

Open ryanartecona opened 5 months ago

ryanartecona commented 5 months ago

Request

Add the OTEL module groupbyattrsprocessor.

Use case

I have an agent scraping prometheus metrics from a kubernetes kubelet, whose metrics I also pass through an otelcol.processor.k8sattributes step (and a otelcol.processor.transform to turn otel resource attributes to datapoint attributes, as mentioned in the otelcol.processor.k8sattributes docs). What I want is basically what's stated in that doc section, to enrich the kubelet prometheus metrics (with labels like pod="myapp-59fb8bb44d-6qfgf",namespace="default") with metadata that becomes extra labels (like k8s_deployment_name="myapp"). The pipeline looks like this:

prometheus.scrape 
  -> otelcol.receiver.prometheus 
    -> otelcol.processor.k8sattributes 
      -> otelcol.processor.transform 
        -> otelcol.exporter.prometheus

However, with that pipeline, I get attributes from the k8sattributes processor that don't match up with the labels reported by kubelet. For example, I'll get a datapoint with namespace="default",k8s_namespace_name="kube-system" (where namespace is from kubelet and k8s_namespace_name is from k8sattributes). Also, most datapoints seem "sticky" to particular label values, i.e. many pods across many namespaces will all get a k8s_namespace_name="kube-system" label.

What I believe is happening is that in the otel components, all the datapoints are associated with a single Resource for the single prometheus.scrape job. So, the k8sattributes step will talk to the Kubernetes API and update the otel Resource, but because the resource is shared, there are conflicts and eventually the last (or the first) write wins and all the other datapoints get those labels.

If that's what's happening, adding the groupbyattrsprocessor module should at least let me fix this pipeline so that the k8sattributes enrichment works as expected.

ptodev commented 5 months ago

Hi, thank you for raising an issue. I think the use case makes sense, but I'm curious how you have configured the association rules in k8sattributes? I suppose there is a resource attribute with the IP address of the scrape rarget?

The only workaround I can think of is to have different otelcol.receiver.prometheus + otelcol.receiver.prometheus pairs for each scrape job, which will hopefully place each scrape job in its own resource, Then the pod assosiation rule would be based on the address of the scrape target, which would be a resource attribute added by otelcol.processor.transform.

A completely different way of doing this (and the way most users of the Agent probably do it today) is to use discovery.kubernetes instead of otelcol.processor.k8sattributes, but then of course the labels won't follow OTel semantic conventions.

ryanartecona commented 5 months ago

but I'm curious how you have configured the association rules in k8sattributes? I suppose there is a resource attribute with the IP address of the scrape rarget?

Indeed. I tried a bunch of combinations of association rules, but the one that got closest to working (as in, adding any new attributes at all) were resource_attributes (not connection). Details below.

k8sattrs.river ``` otelcol.processor.transform "kubelet_metrics_preprocess" { error_mode = "ignore" metric_statements { context = "datapoint" statements = [ "set(resource.attributes[\\"k8s.pod.name\\"], attributes[\\"pod\\"])", "set(resource.attributes[\\"k8s.pod.uid\\"], attributes[\\"pod_uid\\"])", "set(resource.attributes[\\"k8s.namespace.name\\"], attributes\\"namespace\\"])", "set(resource.attributes[\\"k8s.container.name\\"], attributes\\"container\\"])", "delete_key(resource.attributes, \\"net.host.name\\")", "delete_key(resource.attributes, \\"net.host.port\\")", ] } output { metrics = [otelcol.processor.k8sattributes.kubelet_metrics_enriched.input] } } otelcol.processor.k8sattributes "kubelet_metrics_enriched" { pod_association { source { from = "resource_attribute" name = "k8s.pod.name" } source { from = "resource_attribute" name = "k8s.namespace.name" } } pod_association { source { from = "resource_attribute" name = "k8s.pod.uid" } } output = [...] } ```

The only workaround I can think of is to have different otelcol.receiver.prometheus + otelcol.receiver.prometheus pairs for each scrape job, which will hopefully place each scrape job in its own resource, Then the pod assosiation rule would be based on the address of the scrape target, which would be a resource attribute added by otelcol.processor.transform.

I thought the same, but it didn't seem to work. There's a single scrape job per kubelet (where 1 kubelet = 1 k8s node, that's what a discovery.kubernetes.nodes points its targets at), but each of those kubelets is returning cadvisor metrics with tags like pod and container for the stuff running on that kubelet. So I can't split it out into more granular scrape jobs, it's already 1 scrape per target, but the target (kubelet /metrics/cadvisor) contains metrics for what I want to become multiple otel Resources.

I also took a look around to see if there's any other project that has a setup with cadvisor metrics running through otel k8sattributes, but I didn't find any. Even the grafana k8s-monitoring-helm chart doesn't do that, and I think it would run into this exact problem if it did.

kehindesalaam commented 1 month ago

Hello @ptodev , could you please assign the issue to me?