Enhanced Varnish Cache Stats Collection

garceri commented 6 years ago

Feature Request

Collect varnish stats over HTTP

Proposal:

Current varnish input plugin only works by using the varnishstat binary, i propose the plugin gets an overhaul that allows for reading the stats over HTTP.

Current behavior:

Current plugin requires to run telegraf in the same machine/vm/container as Varnish, this is cumbersome and doesn't take advantage of Varnish 4 HTTP stats endpoint

Desired behavior:

Telegraf should be able to remotely read Varnish stats over HTTP using the built in /stats endpoint , ie:

# curl http://myvarnishserver:6085/stats { "timestamp": "2018-08-17T00:33:50", "MAIN.uptime": {"type": "MAIN", "value": 114123, "flag": "c", "description": "Child process uptime" }, "MAIN.sess_conn": {"type": "MAIN", "value": 957726, "flag": "c", "description": "Sessions accepted" }, "MAIN.sess_drop": {"type": "MAIN", "value": 0, "flag": "c", "description": "Sessions dropped" }, "MAIN.sess_fail": {"type": "MAIN", "value": 0, "flag": "c", "description": "Session accept failures" }, "MAIN.client_req_400": {"type": "MAIN", "value": 0, "flag": "c", "description": "Client requests received, subject to 400 errors" }, "MAIN.client_req_417": {"type": "MAIN", "value": 0, "flag": "c", "description": "Client requests received, subject to 417 errors" }, "MAIN.client_req": {"type": "MAIN", "value": 957726, "flag": "c", "description": "Good client requests received" }, "MAIN.cache_hit": {"type": "MAIN", "value": 711088, "flag": "c", "description": "Cache hits" }, "MAIN.cache_hit_grace": {"type": "MAIN", "value": 502991, "flag": "c", "description": "Cache grace hits" }, "MAIN.cache_hitpass": {"type": "MAIN", "value": 1532, "flag": "c", "description": "Cache hits for pass" }, "MAIN.cache_miss": {"type": "MAIN", "value": 238252, "flag": "c", "description": "Cache misses" }, "MAIN.backend_conn": {"type": "MAIN", "value": 7564, "flag": "c", "description": "Backend conn. success" }, "MAIN.backend_unhealthy": {"type": "MAIN", "value": 0, "flag": "c", "description": "Backend conn. not attempted" }, "MAIN.backend_busy": {"type": "MAIN", "value": 0, "flag": "c", "description": "Backend conn. too many" }, "MAIN.backend_fail": {"type": "MAIN", "value": 4, "flag": "c", "description": "Backend conn. failures" }, "MAIN.backend_reuse": {"type": "MAIN", "value": 737558, "flag": "c", "description": "Backend conn. reuses" }, ... }

Use case:

I need something that allows me to visualize varnish metrics in a docker container environment with minimal modifications to existing infrastructure

danielnelson commented 6 years ago

Seems like a good idea, is this something you could work on?

Would this require any configuration changes on the Varnish server? We should check how the varnishstat binary gets the stats, does it use the HTTP interface or does it use a different method.

Would this result in any breaking changes to the produced metrics? New fields are find but we wouldn't want to change the names of fields or the the format of what they store. We can work around this if it doesn't make sense to keep compatibility but there are a few extra tasks that must be performed.

mjf commented 6 years ago

@danielegozzi I still somehow feel that use cases like this one should leverage some sort of rather general input plugin with a data_format or a parser, i.e:

[[inputs.http]]
  urls = [ "http://myvarnishserver:6085/stats" ]
  method = "GET"
  data_format = "json"
  # ...
  tag_keys = [
     # ...
  ]

Can you please explain why separate input plugin would be better (except the fact that tag_keys directive may be not good-enough in this specific case and there should be some way to deal with more complex structures just like the one Varnish Cache produces)?

danielnelson commented 6 years ago

That's basically it, using a specialized parser allows us to structure the data schema exactly how we want, and provides a level of abstraction so that we can hide changes and maintain the schema. That said, it would be nice if we had a more flexible JSON parser that could handle this while still being as performance as custom code.

mjf commented 6 years ago

@danielnelson Thanks for clarification.

garceri commented 6 years ago

My knowledge of Go is rather limited and i'm sure the code i would produce would be sub-standard (if i ever get something working). the specialized parser is a good idea, maybe we can use JMESPATH (http://jmespath.org/specification.html) a-la jq to enable the flexibility required to parse data input structures of higher complexity

Here is an example of how the input data on the OP would be filtered with JQ/JMESPath, i think will prove useful for a number of use cases beyond varnish.

# cat aa | jq -r 'del(."timestamp") | keys[] as $key | "\"\($key)\": \"\(.[$key].value)\""'
"MAIN.backend_busy": "0"
"MAIN.backend_conn": "7564"
"MAIN.backend_fail": "4"
"MAIN.backend_reuse": "737558"
"MAIN.backend_unhealthy": "0"
"MAIN.cache_hit": "711088"
"MAIN.cache_hit_grace": "502991"
"MAIN.cache_hitpass": "1532"
"MAIN.cache_miss": "238252"
"MAIN.client_req": "957726"
"MAIN.client_req_400": "0"
"MAIN.client_req_417": "0"
"MAIN.sess_conn": "957726"
"MAIN.sess_drop": "0"
"MAIN.sess_fail": "0"
"MAIN.uptime": "114123"

garceri commented 6 years ago

I was able to pull stats out of /stats over http using the http input plugin with JSON data format, no further modifications required, anyway the ability to customize the input with filters would be a nice to have

This is the config i'm using

[[inputs.http]]
timeout = "15s"
urls = [ "http://varnish:6085/stats" ]
username = "secretuser"
password = "XXXXXXXX"
data_format = "json"
name_override = "varnish_instance_name"

influxdata / telegraf