jonnenauha / prometheus_varnish_exporter

Varnish exporter for Prometheus
MIT License
177 stars 101 forks source link

Error "gauge should be a Counter" when scraping metrics for varnish 4.0 #47

Closed glennslaven closed 5 years ago

glennslaven commented 5 years ago

prometheus_varnish_exporter runs and reports a Successful scrape, however, if you curl the /metrics endpoint this is returned:

error gathering metrics: 4 error(s) occurred:
* collected metric varnish_main_sessions label:<name:"type" value:"herd" > gauge:<value:0 >  should be a Counter
* collected metric varnish_main_sessions label:<name:"type" value:"pipeline" > gauge:<value:0 >  should be a Counter
* collected metric varnish_main_sessions label:<name:"type" value:"closed" > gauge:<value:0 >  should be a Counter
* collected metric varnish_main_sessions label:<name:"type" value:"readahead" > gauge:<value:0 >  should be a Counter

Only happens on Varnish 4.0, Varnish 4.1 and higher are fine. This did not happen prior to the 1.5 release.

I have a gist reproing the behaviour here https://gist.github.com/glennslaven/cafb50fa21a45b7798ce9d05a7430d7f .

jonnenauha commented 5 years ago

Thanks for the great repro gist. That was very useful as its hard to keep multiple versions of varnish running for normal dev.

I group session metrics starting with sess_ into varnish_main_sessions where the metrics label type is set into sess_<type>.

4.0 varnishstat exports this

"MAIN.sess_herd": {
  "type": "MAIN",
  "value": 0, 
  "flag": "a", 
  "description": "Session herd"
},

6.0.1 varnishstat exports this

"MAIN.sess_herd": {
    "description": "Session herd",
    "flag": "c", "format": "i",
    "value": 0
  },

The idea is that those are then grouped into a single metric name with the type counter.

# HELP varnish_main_sessions Number of sessions
# TYPE varnish_main_sessions counter
varnish_main_sessions{type="herd"} 0
varnish_main_sessions{type="readahead"} 0
varnish_main_sessions{type="closed"} 0

I think the bug is that in 4.0 specifically some of those sess_ metrics are resolved as counters and some as gauges. This mixes multiple types in a single varnish_main_sessions prom metric which probably gives the error out. This is my best guess looking at this quickly. Probably =>4.0.1 they fixed them all to be counters.

In the full 4.0 prom metrics you can see the flag a. Here is another sess_ one "MAIN.sess_queued": {"type": "MAIN", "value": 0, "flag": "c", "description": "Sessions queued for thread"}, that has the flag c.

The semi hack fix is to check the first metrics "parent" created (in this case varnish_main_sessions), and not allow any other "child" type to deviate from that metric type. I will look into this a bit more when I have time.

In the meanwhile have you downgraded the exporter or upgraded varnish so you have a working setup? :)

jonnenauha commented 5 years ago

This is the new code in latest version that breaks this https://github.com/jonnenauha/prometheus_varnish_exporter/blob/master/varnish.go#L141-L149

It does not handle the a type and defaults that to a gauge. I did not know that a existed, there are no good varnish docs what these flag values mean and what are the possible values.

Looking at the full 6.x dump, it does not have any a values. So if we just handle that as a counter I think that should cover the old versions. Again I suspect in 4.0.1 they either removed a or just changed the flag to c like the other sess_ ones.

glennslaven commented 5 years ago

Thanks, yes we can downgrade the exporter. Unfortunately upgrading Varnish isn't an option at this time, we have a number of legacy instances. Working on getting them updated, but not there yet.

Looks like handling the a type would fix the issue.

glennslaven commented 5 years ago

According to the varnish 4.0 code https://github.com/varnishcache/varnish-cache/blob/4.0/include/tbl/vsc_f_main.h#L36 it's Accumulator (deprecated, use 'c')

I've pushed a PR that looks like it fixes this issue, I've tested the compiled binary against my test above. https://github.com/jonnenauha/prometheus_varnish_exporter/pull/48