Closed misterek closed 4 months ago
@misterek good catch. Can you show me the contents of your YAML config file? Concretely, the contents of the "discovery" section.
A couple of things redacted. Both the instance_id section and the bucket changes were to see if I could reduce the number of metrics getting produced.
beyla-config.yml: |
attributes:
kubernetes:
enable: true
instance_id:
dns: false
override_hostname: "beyla_hostname"
override_instance_id: "beyla_instance"
routes:
unmatched: heuristic
ignored_patterns:
- /healthz
prometheus_export:
buckets:
request_size_histogram: [0,128,512,2048,8192]
duration_histogram: [0, 0.005, 0.01, 0.025,0.1, 0.5, 1, 5, 10 ]
discovery:
services:
- k8s_deployment_name: "^application.*$"
I now realize that you are probably going to tell me that disabling the decorators also means that the discovery of services doesn't have the right way to discover the services :)
@misterek yes, you realized well! It shouldn't but we didn't think in the use case of willing to aggregate all the service pods in a single Beyla instance.
Appreciating a lot your feedback, I have created the following feature request: https://github.com/grafana/beyla/issues/580
In the meanwhile, I think that the only solution would be to enable Kubernetes and configure your Prometheus/OpenTelemetry collector or your cloud endpoint to aggregate the metrics by application. For example, in Grafana Cloud: https://grafana.com/docs/grafana-cloud/cost-management-and-billing/reduce-costs/metrics-costs/control-metrics-usage-via-adaptive-metrics/define-metrics-aggregation-rules/
Another secondary workaround is, with your current configuration, change:
- k8s_deployment_name: "^application.*$"
by
- exe_path: <a regular expression matching the executable name of your application service>
Then ignore all the paths not belonging to your application.
Thank you! For my test the exe_path will work well I think. I'll play around with it.
My small little sample app generated several thousand series overnight, so I'm glad I spent some time playing around with it.
I can 100% see the labels that I'm getting rid of being useful (i.e. is any pod behaving different than any other pod), but the explosion of metrics was a bit much.
Thanks again!
Alright, gave that a test, and here's my little feedback:
This did as expected, however, it also set the "service_name" to 'ruby'. Which, again, seems to make sense. But, if I had several things running here, clearly that wouldn't be specific enough.
I can't find a way to get rid of that one yet. I'm not sure what I'd do with that information, but it does create another set of metrics for each node. In an autoscaling configuration, I think this would also have the effect of increasing the number of series quite a bit
I"m going to go ahead and close this issue, since it's very likely that I'm being too picky, but this is just my feedback. I think my biggest concern here is that it seems quite easy to generate a lot of metrics accidentally. In an app that had dozens of pods, plus dozens of endpoints, I think you'd get a lot of very expensive series. There's ways to handle this outside beyla, but I'd be very interested in seeing these options as out of the box configurations.
Thanks again for all the help!
Thank you for your feedback! It's very valuable and we will seriously consider it for our roadmap.
Right now I'm using the prometheus metrics endpoint instead of otel.
Using the config:
Each of the metrics is tacking on labels like "k8s_node_name", "k8s_pod_start_time", "k8s_pod_uid", "target_instance", etc.. This leads to a lot of metrics. If I was monitoring a website with, say, 500 pods. That ends up being 500 pods * number of routes of metrics each time there is a deploy.
I'd like to disable the k8s decorators, as mentioned here: https://grafana.com/docs/beyla/latest/configure/options/#kubernetes-decorator
However, if I change to:
I get no metrics whatsoever at the /metrics endpoint.
Is there something I'm doing wrong, or is there a bug in here somewhere?