Open mimaison opened 10 months ago
Currently the allowlist
configuration is just applied on the metric names and ignores labels. I think it's important to allow filtering on labels too.
The main use cases where this is useful are:
JOIN_GROUP
and HEARTBEAT
requests and not on the whole ~70 requests types.The allowlist
configuration accepts a list of regex patterns that are matched against the metrics in the Prometheus format. In the Prometheus format, labels are appended after the metric name between curly brackets, and separated by commas. For example:
kafka_network_requestmetrics_localtimems{request="CreateDelegationToken",quantile="0.999",}
This brings some challenges if we want to also filter labels:
With these issues, I see 2 potential implementations that solve all issues:
Treat curly bracket in a special way and exclude them from the regex logic. For example, if an entry in the allowlist is some_metric_name{mylabel="my-value"}
, the reporter will first extract and parse the text in between curly brackets. It will then only apply some_metric_name
as a regex to filter metrics, and finally only keep metrics that have the labels provided.
{}
for regex matching.Change the way users provide their metrics filter. One option is to do it via a configuration file. For example in YAML:
- some_metric_name
labels:
- mylabel: "my-value"
{}
. YAML allows pretty much any characters if a key is surrounded by quotes.@k-wall @OwenCorrigan76 @scholzj @tombentley
@mimaison I think [2] would be enough to handle also other label-matching operators. Therefore, [2] would be the way to go. But few things to take into consideration if we are extracting and then parsing the text inside the {}
:
=
http_requests_total{replica!="rep-a",replica=~"rep.*"}
. In the previous example, replica should not match rep-a, and should regex match "rep.*"(anything starting with rep."The goal is not to re-implement PromQL. We just need a basic filtering mechanism that supports labels too. I think we should really keep it simple as I see the extreme configurability of jmx_exporter one of its drawbacks. It allows writing filters that are hard to understand and ultimately expensive to compute making it pretty slow if you enable too many metrics.
Do we really need this feature in the reporter? Do we really need it right now? Prometheus already has the ability to drop metrics at ingress time, so if the volume of metrics coming from the reporter was truly bothersome for a user, they'd already have that knob to twiddle. (Aside: doesn't this go beyond the scope of the original Strimzi Proposal?)
I'd say let's wait on this one (YAGNI). If we get user feedback saying this is important, then we can think then with a fresh proposal.
The goal is not to re-implement PromQL. We just need a basic filtering mechanism that supports labels too. I think we should really keep it simple as I see the extreme configurability of jmx_exporter one of its drawbacks. It allows writing filters that are hard to understand and ultimately expensive to compute making it pretty slow if you enable too many metrics.
I agree with you.
@mimaison what's the update of this? Are we following @k-wall's suggestion? Otherwise it would be interesting to pickup this issue
@antonio-pedro99 Thanks for volunteering, however the first step is to decide what to do. If you have opinions on the various points being discussed, please share them.
Following the update of the Prometheus client https://github.com/strimzi/metrics-reporter/commit/4c1fd9e85774f64a4df406062d124269e7093e50, a trailing commas is no longer injected after the last label.
I think it would be worth checking whether we could change the regex separator (maybe to ;
) and see how a filter with labels (so escaping the various conflicting characters) would look like.
I spent some time investigating our various options. The "best" (least worst?) solution I found was to allow configuring the separator for the allowlist and have logic to mimic the Prometheus metric name serialization to compute the expected final name with labels.
Here is a PoC: https://github.com/strimzi/metrics-reporter/compare/main...mimaison:metrics-reporter:label_filter?expand=1
This allows users to include labels in their allowlist. For example using the following allowlist:
prometheus.metrics.reporter.allowlist=kafka_server_kafka_network_requestmetrics_totaltimems\\{request="(Produce|Fetch)".*
I only get these metrics:
# TYPE kafka_server_kafka_network_requestmetrics_totaltimems summary
kafka_server_kafka_network_requestmetrics_totaltimems{request="Fetch",quantile="0.5"} 0.0
kafka_server_kafka_network_requestmetrics_totaltimems{request="Fetch",quantile="0.75"} 0.0
kafka_server_kafka_network_requestmetrics_totaltimems{request="Fetch",quantile="0.95"} 0.0
kafka_server_kafka_network_requestmetrics_totaltimems{request="Fetch",quantile="0.98"} 0.0
kafka_server_kafka_network_requestmetrics_totaltimems{request="Fetch",quantile="0.99"} 0.0
kafka_server_kafka_network_requestmetrics_totaltimems{request="Fetch",quantile="0.999"} 0.0
kafka_server_kafka_network_requestmetrics_totaltimems_count{request="Fetch"} 0
kafka_server_kafka_network_requestmetrics_totaltimems_sum{request="Fetch"} 0.0
kafka_server_kafka_network_requestmetrics_totaltimems{request="Produce",quantile="0.5"} 0.0
kafka_server_kafka_network_requestmetrics_totaltimems{request="Produce",quantile="0.75"} 0.0
kafka_server_kafka_network_requestmetrics_totaltimems{request="Produce",quantile="0.95"} 0.0
kafka_server_kafka_network_requestmetrics_totaltimems{request="Produce",quantile="0.98"} 0.0
kafka_server_kafka_network_requestmetrics_totaltimems{request="Produce",quantile="0.99"} 0.0
kafka_server_kafka_network_requestmetrics_totaltimems{request="Produce",quantile="0.999"} 0.0
kafka_server_kafka_network_requestmetrics_totaltimems_count{request="Produce"} 0
kafka_server_kafka_network_requestmetrics_totaltimems_sum{request="Produce"} 0.0
In the end, it's ok but not a great. The main issues are:
kafka_server_kafka_network_requestmetrics_totaltimems_count{request="Produce"}
does not technically match the allowlist)Also while doing this work I identified https://github.com/strimzi/metrics-reporter/issues/30
prometheus.metrics.reporter.allowlist=kafka_server_kafka_network_requestmetrics_totaltimems\{request="(Produce|Fetch)".*
is label order in the serialised form of the metric something that's guaranteed?
Labels are sorted by name. This is mentioned in the javadoc: https://prometheus.github.io/client_java/api/io/prometheus/metrics/model/snapshots/Labels.html
I assume the exposition formats preserve that.
At the moment, the allowlist is applied on the metric names without the labels. We should allow filtering metrics with specific labels.