prometheus / consul_exporter

Exporter for Consul metrics
Apache License 2.0
436 stars 167 forks source link

Output field from Consul #123

Open pvyaka01 opened 5 years ago

pvyaka01 commented 5 years ago

Consul exposes a label called Output in it's API which can be especially useful for health checks when a script outputs a value - for example: "All processes are up" in case of status="passing" or "Process is dead" in case of status="critical".

Here's an example: ,"CheckID":"serfHealth","Name":"Serf Health Status","Status":"passing","Notes":"","Output":"Agent alive and reachable","ServiceID":"","ServiceName":"","ServiceTags":[],"Definition" We can see this with curl http://localhost:8500/v1/health/state/any

Can that be exposed through consul_exporter? It is helpful when we send alerts for failing checks. Thanks!

simonpasquier commented 5 years ago

In general we avoid labels with unbounded values because it could increase labels cardinality dramatically and also because instrumentation practices recommend that all label values are exposed (series that come and go are difficult to deal with).

pvyaka01 commented 5 years ago

Ok, understood. Any ideas how i can scrape this field? Thanks for the help!

SuperQ commented 5 years ago

I don't think exposing service stats would be in our usual category of unbounded metrics. This seems on the surface like it would be similar to kube state metrics, or systemd service state metrics.

simonpasquier commented 5 years ago

IIUC the Output field could vary a lot with check scripts. Eg if the check is running ping -c1 foo.example.com, the output will be different (almost) every time. Of course you can still write "good" checks that generate predictable outputs but the exporter can't know for sure.

tgolebiowski-tbscg commented 2 years ago

Still - this is a very much needed feature... While the string description is rarely so useful, it is a lot different when the check outputs numeric value that it measures. Having the output numeric value exposed to Prometheus would benefit us greatly with ability to monitor trends and predict failures before the error actually happens.

It seems that all that needs to be done to make it sensible is to add some limits on what outputs can be exposed and discard the rest.

I'm pretty sure that if we allowed numeric values (and possibly strings up to 128 characters) that would cover vast majority of needs while not risking pushing junk into Prometheus (by virtue of ignoring non-compliant checks) ..