Open tomklapka opened 2 years ago
Hi,
Hmm our readme says:
...tags are created for each label.
I went looking and it does look like our prometehus parser will read labels. If I dump a similar metric in a file:
kafka_controller_controllerchannelmanager_queuesize_value{broker_id="1", container="jmx-exporter", endpoint="http-metrics", instance="172.30.2.222:5556", job="kafka-jmx-metrics", namespace="default", pod="kafka-1", service="kafka-jmx-metrics"} 3
And then read the file using the Prometheus data format:
[agent]
omit_hostname = true
[[outputs.file]]
[[inputs.file]]
files = ["data.json"]
data_format = "prometheus"
I get a metric that includes all the labels:
prometheus,broker_id=1,container=jmx-exporter,endpoint=http-metrics,instance=172.30.2.222:5556,job=kafka-jmx-metrics,namespace=default,pod=kafka-1,service=kafka-jmx-metrics kafka_controller_controllerchannelmanager_queuesize_value=3 1658436936000000000
I then hosted that file and used the Prometehus input plugin to read it:
[agent]
omit_hostname = true
[[outputs.file]]
[[inputs.prometheus]]
urls = ["http://localhost:8000/metrics.out"]
metric_version = 2
And got a similar metric with those same tags + the URL tag
prometheus,broker_id=1,container=jmx-exporter,endpoint=http-metrics,instance=172.30.2.222:5556,job=kafka-jmx-metrics,namespace=default,pod=kafka-1,service=kafka-jmx-metrics,url=http://localhost:8000/metrics.out kafka_controller_controllerchannelmanager_queuesize_value=3 1658437291000000000
The difference is you are using kubernetes_services
, which uses a slightly different logic to get the URLs to scrape. All URLs end up eventually in the same slice used to collect at each Gather
interval. The logic there is the same for each.
There is some difference with how tags are handled for pods, but that is not used in your config.
Are you using any processors? Does your full config have any taginclude
or tagexclude
options?
edit: I also tried a few examples from your attached kafka-metrics-exporter-list.txt
, like kafka_exporter_build_info
and promhttp_metric_handler_requests_total=4853
and those reported correctly with the labels.
Hi Joshua, I looked into it more deeply and discovered that Prometheus uses service discovery meta labels for internal (re)labeling purposes. This answers the existence of additional labels in Prometheus which are not exposed in Kafka's exporter metrics endpoint and therefore they are not consumed by Telegraf plugin. It would be nice to have such mechanism in Prometheus plugin, because without this it can be impossible to distinguish between different metric sources (e.g. Kafka replicas) when scraping metrics from single endpoint (e.g. service).
This answers the existence of additional labels in Prometheus which are not exposed in Kafka's exporter metrics endpoint and therefore they are not consumed by Telegraf plugin.
I am not sure I follow this statement, so please confirm if I am following along:
It sounds like the source of your metrics is a Kafka exporter producing Prometheus metrics for consumption. You are using the Telegraf Prometheus input with service discovery, so one URL in your Telegraf config can find multiple Kafka exporter's URLs. These Kafka exporters do not have any labels in the metrics (I think that is what the screenshot shows?) When Telegraf grabs these metrics from multiple Kafka exporters, there is no clear way to determine which metric belongs to which Kafka exporter because the only tag is the url
where the service was originally discovered from?
After writing this, this feels like a better fit for using the kube_inventory
input plugin to scape the K8s meta metrics.
Source of metrics is Kafka installed via Bitnami Helm chart. It uses services to expose metrics endpoints - one redirects traffic from/to Kafka exporter pod, second one to jmx exporter as a sidecar container. Kafka exporter pod can have multiple broker/replica endpoints configured. When Telegraf grabs metrics from the exporter service it has only 3 specific telegraf/plugin tags assigned - url, host, address. The challenge is to distinguish between these measurements - from which broker/replica they came from. Prometheus itself can do it.
Relevant telegraf.conf
Logs from Telegraf
System info
Telegraf 1.23.2, AWS EKS 1.20, Kafka installed via Bitnami Helm chart
Docker
No response
Steps to reproduce
Metric example I got from Prometheus server via ServiceMonitor:
Prometheus metric example I got from the kafka metric exporter service endpoint:
Influxdb lineprotocol example from telegraf I got:
Expected behavior
Additional tags like broker_id="1", container="jmx-exporter", endpoint="http-metrics", instance="172.30.2.222:5556", job="kafka-jmx-metrics", namespace="default", pod="kafka-1", service="kafka-jmx-metrics".
Actual behavior
I'm missing additional meta tags like broker_id="1", container="jmx-exporter", endpoint="http-metrics", instance="172.30.2.222:5556", job="kafka-jmx-metrics", namespace="default", pod="kafka-1", service="kafka-jmx-metrics". With current output I'm not able to distinguish between different Kafka brokers and all broker metrics are mixed together.
Is it possible to add prometheus meta labels as tags in prometheus plugin? Maybe it can be done with some configuration which I've missed.
Additional info
kafka-metrics-exporter-list.txt