Open ezombie opened 4 years ago
Can you add your prometheus_client output plugin configuration?
cat /etc/telegraf/telegraf.d/prometheus.conf
# Configuration for the Prometheus client to spawn
[[outputs.prometheus_client]]
## Address to listen on
listen = "0.0.0.0:9273"
expiration_interval = "10s"
string_as_label = false
# metric_version = 2
Looking into this a bit closer, and the issue appears to be that labels starting with a 0-9 are illegal in Prometheus format and are rejected by the official library, in Telegraf 1.13 we updated the library and it has become more strict preventing these.
If you switch to metric_version = 2
, it should output the metrics that don't have any labels starting with a number, but it will still drop those that do.
I think the best way forward is to adjust the consul input to avoid these types of tags. What if you disable the tag_delimiter
option in the consul input?
cat consul.conf on version 1.14.2 and the configuration file, the problem persists.
A possible solution would be to introduce an additional option into the consul module that will rename the metrics to a template that will be correct for the prometheus library.
[[inputs.consul]]
interval = "10s"
datacentre = "dc"
address = "consul:8500"
Having UUIDs as the tagkey is not an ideal setup for any output, so I think we can come up with a better strategy for creating metrics. Can you show the output of telegraf --input-filter consul --test | grep a-B
using the configuration without tag_delimiter
?
2020-04-30T18:18:01Z I! Starting Telegraf 1.14.2
> consul_health_checks,2947540c-a0bb-4549-b76d-0b6188036b8a=2947540c-a0bb-4549-b76d-0b6188036b8a,check_id=service:2947540c-a0bb-4549-b76d-0b6188036b8a,host=XXX,n=n,node=XXX,service_name=YYY check_name="Service 'YYY' check",critical=0i,passing=1i,service_id="2947540c-a0bb-4549-b76d-0b6188036b8a",status="passing",warning=0i 1588270681000000000
> consul_health_checks,89427f45-2034-4c08-a12a-bb17baf0fb8d=89427f45-2034-4c08-a12a-bb17baf0fb8d,check_id=service:89427f45-2034-4c08-a12a-bb17baf0fb8d,host=XXX,n=n,node=XXX,service_name=YYY check_name="Service 'YYY' check",critical=0i,passing=1i,service_id="89427f45-2034-4c08-a12a-bb17baf0fb8d",status="passing",warning=0i 1588270681000000000
> consul_health_checks,9ea180bd-9bc1-4739-b3b0-7c9d479124b6=9ea180bd-9bc1-4739-b3b0-7c9d479124b6,check_id=service:9ea180bd-9bc1-4739-b3b0-7c9d479124b6,host=XXX,n=n,node=XXX,service_name=YYY check_name="Service 'YYY' check",critical=0i,passing=1i,service_id="9ea180bd-9bc1-4739-b3b0-7c9d479124b6",status="passing",warning=0i 1588270681000000000
> consul_health_checks,af96bbf7-eb1f-4282-8acb-dd3890e40d20=af96bbf7-eb1f-4282-8acb-dd3890e40d20,check_id=service:af96bbf7-eb1f-4282-8acb-dd3890e40d20,host=XXX,n=n,node=XXX,service_name=YYY check_name="Service 'YYY' check",critical=0i,passing=1i,service_id="af96bbf7-eb1f-4282-8acb-dd3890e40d20",status="passing",warning=0i 1588270681000000000
> consul_health_checks,bf0d639f-b667-43ba-8d55-11c444229b80=bf0d639f-b667-43ba-8d55-11c444229b80,check_id=service:bf0d639f-b667-43ba-8d55-11c444229b80,host=XXX,n=n,node=XXX,service_name=YYY check_name="Service 'YYY' check",critical=0i,passing=1i,service_id="bf0d639f-b667-43ba-8d55-11c444229b80",status="passing",warning=0i 1588270681000000000
> consul_health_checks,check_id=service:d1359ddb-462d-4efe-969f-4bd0032b0d31,d1359ddb-462d-4efe-969f-4bd0032b0d31=d1359ddb-462d-4efe-969f-4bd0032b0d31,host=XXX,n=n,node=XXX,service_name=YYY check_name="Service 'YYY' check",critical=0i,passing=1i,service_id="d1359ddb-462d-4efe-969f-4bd0032b0d31",status="passing",warning=0i 1588270681000000000
> consul_health_checks,check_id=service:e630c30b-2b28-49e5-895c-dcc1d3ac971e,e630c30b-2b28-49e5-895c-dcc1d3ac971e=e630c30b-2b28-49e5-895c-dcc1d3ac971e,host=XXX,n=n,node=XXX,service_name=YYY check_name="Service 'YYY' check",critical=0i,passing=1i,service_id="e630c30b-2b28-49e5-895c-dcc1d3ac971e",status="passing",warning=0i 1588270681000000000
> consul_health_checks,check_id=service:f32802d2-9c2b-4b7e-b3c3-067f3efc4dc6,f32802d2-9c2b-4b7e-b3c3-067f3efc4dc6=f32802d2-9c2b-4b7e-b3c3-067f3efc4dc6,host=XXX,n=n,node=XXX,service_name=YYY check_name="Service 'YYY' check",critical=0i,passing=1i,service_id="f32802d2-9c2b-4b7e-b3c3-067f3efc4dc6",status="passing",warning=0i 1588270681000000000
> consul_health_checks,check_id=service:f7e94e9c-2054-45c4-a362-46b78fafedd1,f7e94e9c-2054-45c4-a362-46b78fafedd1=f7e94e9c-2054-45c4-a362-46b78fafedd1,host=XXX,n=n,node=XXX,service_name=YYY check_name="Service 'YYY' check",critical=0i,passing=1i,service_id="f7e94e9c-2054-45c4-a362-46b78fafedd1",status="passing",warning=0i 1588270681000000000
Can you run this query against the consul HTTP api in order to get the raw JSON for one of the check_id that produces a UUID tagkey:
curl -G http://consul:8500/v1/health/state/any --data-urlencode 'filter=CheckID == "service:2947540c-a0bb-4549-b76d-0b6188036b8a"'
Quick follow-up, what I'm expecting to see is that you have ServiceTags like:
"ServiceTags": [
"2947540c-a0bb-4549-b76d-0b6188036b8a"
],
I'm far from a Consul expert, so to me tags like this seem a bit odd. Can you tell me a bit about how you use this type of tag?
curl -G http://consul:8500/v1/health/state/any --data-urlencode 'filter=CheckID == "service:2947540c-a0bb-4549-b76d-0b6188036b8a"'
[{"Node":"XXX","CheckID":"service:2947540c-a0bb-4549-b76d-0b6188036b8a","Name":"Service 'YYY' check","Status":"passing","Notes":"","Output":"HTTP GET http://127.0.0.1:41615/health/?service=2947540c-a0bb-4549-b76d-0b6188036b8a: 200 OK Output: ","ServiceID":"2947540c-a0bb-4549-b76d-0b6188036b8a","ServiceName":"YYY","ServiceTags":["n","2947540c-a0bb-4549-b76d-0b6188036b8a"],"Type":"http","Definition":{},"CreateIndex":452885815,"ModifyIndex":452885837}]
I think what will be best in your case is to exclude these tags. The information is contained in the check_id
tag so adding the UUID is superfluous:
[[inputs.consul]]
tagexclude = ["[!0-9]*"]
As a more general fix, perhaps we should add a new option that matches only ServiceTags, similar to how the docker plugin is structured:
[[inputs.consul]]
service_tag_include = []
service_tag_exclude = ["[0-9]*"]
Hello!
Also have this problem.
My setup: i have consul and i put some uniq uuid in tags meta for each service.
Consul allows this operation with limits: Key can contain only ASCII chars and no special characters (A-Z a-z 0-9 _ and -).
https://www.consul.io/docs/agent/services.html
But Prometheus can't take labels with first digit: Label names may contain ASCII letters, numbers, as well as underscores. They must match the regex [a-zA-Z_][a-zA-Z0-9_]*
https://prometheus.io/docs/concepts/data_model/#metric-names-and-labels
Maybe should have an option, which shows if we wants to see tags as labels or not? Or any regex for including this tags, not all of them.
At this moment i see that valid consul meta configuration can affect on some metrics(!!) not even labels disappear.
Same theme was mentioned in several topics: my issue with tags: https://github.com/influxdata/telegraf/issues/5522 PR where tags as labels was introduced: https://github.com/influxdata/telegraf/pull/4155
@ekbfh What do you think about if we add the service_tag_include
and service_tag_exclude
options above?
@danielnelson It might work, if you plan enable them by default. Cause as i say, i may have this naming in consul and cannot in prom.
Could you also add an option to choose what tags i want to gather?
For ex: gather_all_tags = true/false
, cause without this i have bigger cardinality.
You would be able to exclude all service tags with service_tag_exclude = ["*"]
.
We should also make sure that the prometheus output just removes tags that it cannot encode as labels, without removing the output.
Yes, just removing tags is a good idea
Any update?
After upgrade 1.12.6 to 1.13+ (1.14.1 affected too) i see this picture
Grafana quiery:
sum by (service_name)(consul_health_checks_passing{service_name!=""})
telegraf --config telegraf.conf --config ./telegraf.d/basic_inputs.conf --config ./telegraf.d/consul.conf --test | grep a-B
curl http://127.0.0.1:9273/metrics | grep -i a-B
System info:
Telegraf 1.12.6 and 1.13.0 Consul v1.6.2 Prometheus 2.15.2 Centos 7.8
Steps to reproduce:
upgrade 1.12.6 to 1.13+
Expected behavior:
Actual behavior:
Additional info: