Can't disable Kubernetes Decorators

grafana / beyla

eBPF-based autoinstrumentation of web applications and network metrics

https://grafana.com/oss/beyla-ebpf/

Apache License 2.0

1.23k stars 78 forks source link

Can't disable Kubernetes Decorators #579

Closed misterek closed 4 months ago

misterek commented 4 months ago

Right now I'm using the prometheus metrics endpoint instead of otel.

Using the config:

attributes:
  kubernetes:
    enable: true

Each of the metrics is tacking on labels like "k8s_node_name", "k8s_pod_start_time", "k8s_pod_uid", "target_instance", etc.. This leads to a lot of metrics. If I was monitoring a website with, say, 500 pods. That ends up being 500 pods * number of routes of metrics each time there is a deploy.

I'd like to disable the k8s decorators, as mentioned here: https://grafana.com/docs/beyla/latest/configure/options/#kubernetes-decorator

However, if I change to:

attributes:
  kubernetes:
    enable: false

I get no metrics whatsoever at the /metrics endpoint.

Is there something I'm doing wrong, or is there a bug in here somewhere?

mariomac commented 4 months ago

@misterek good catch. Can you show me the contents of your YAML config file? Concretely, the contents of the "discovery" section.

misterek commented 4 months ago

A couple of things redacted. Both the instance_id section and the bucket changes were to see if I could reduce the number of metrics getting produced.

  beyla-config.yml: |
    attributes:
      kubernetes:
        enable: true
    instance_id:
      dns: false
      override_hostname: "beyla_hostname"
      override_instance_id: "beyla_instance"
    routes:
      unmatched: heuristic
      ignored_patterns:
        - /healthz
    prometheus_export:
      buckets:
        request_size_histogram: [0,128,512,2048,8192]
        duration_histogram: [0, 0.005, 0.01, 0.025,0.1, 0.5, 1, 5, 10 ]
    discovery:
      services:
        - k8s_deployment_name: "^application.*$"

I now realize that you are probably going to tell me that disabling the decorators also means that the discovery of services doesn't have the right way to discover the services :)

mariomac commented 4 months ago

@misterek yes, you realized well! It shouldn't but we didn't think in the use case of willing to aggregate all the service pods in a single Beyla instance.

Appreciating a lot your feedback, I have created the following feature request: https://github.com/grafana/beyla/issues/580

In the meanwhile, I think that the only solution would be to enable Kubernetes and configure your Prometheus/OpenTelemetry collector or your cloud endpoint to aggregate the metrics by application. For example, in Grafana Cloud: https://grafana.com/docs/grafana-cloud/cost-management-and-billing/reduce-costs/metrics-costs/control-metrics-usage-via-adaptive-metrics/define-metrics-aggregation-rules/

mariomac commented 4 months ago

Another secondary workaround is, with your current configuration, change:

        - k8s_deployment_name: "^application.*$"

        - exe_path: <a regular expression matching the executable name of your application service>

Then ignore all the paths not belonging to your application.

misterek commented 4 months ago

Thank you! For my test the exe_path will work well I think. I'll play around with it.

My small little sample app generated several thousand series overnight, so I'm glad I spent some time playing around with it.

I can 100% see the labels that I'm getting rid of being useful (i.e. is any pod behaving different than any other pod), but the explosion of metrics was a bit much.

Thanks again!

misterek commented 4 months ago

Alright, gave that a test, and here's my little feedback:

added exe_name: ruby

This did as expected, however, it also set the "service_name" to 'ruby'. Which, again, seems to make sense. But, if I had several things running here, clearly that wouldn't be specific enough.

target_instance exists

I can't find a way to get rid of that one yet. I'm not sure what I'd do with that information, but it does create another set of metrics for each node. In an autoscaling configuration, I think this would also have the effect of increasing the number of series quite a bit

I"m going to go ahead and close this issue, since it's very likely that I'm being too picky, but this is just my feedback. I think my biggest concern here is that it seems quite easy to generate a lot of metrics accidentally. In an app that had dozens of pods, plus dozens of endpoints, I think you'd get a lot of very expensive series. There's ways to handle this outside beyla, but I'd be very interested in seeing these options as out of the box configurations.

Thanks again for all the help!

mariomac commented 4 months ago

Thank you for your feedback! It's very valuable and we will seriously consider it for our roadmap.