hashicorp / consul-k8s

First-class support for Consul Service Mesh on Kubernetes
https://www.consul.io/docs/k8s
Mozilla Public License 2.0
670 stars 324 forks source link

Emit consul-k8s metrics to Prometheus #249

Open pnovotnak opened 4 years ago

pnovotnak commented 4 years ago

Community Note

Description

consul-k8s components should publish metrics so that people can monitor and alert on bad conditions.

david-yu commented 4 years ago

Hi @pnovotnak thanks for your feedback! Could you give us more information on what bad conditions you are hoping to catch with metrics? Also is there information on what type of monitoring tools you are using (i.e. Prometheus or DataDog)? We have had issues like that come in the past like https://github.com/hashicorp/consul-helm/issues/339 that we've been keeping our eye on.

pnovotnak commented 4 years ago

We use prometheus, which is the standard in k8s environments. Having an endpoint which published prom metrics would allow us to have metrics just by having an annotation present on the pod.

I would like at bare minimum error and success counters, with each increment representing a fetch & update of a single service. Using that I can calculate the error rate as a percentage.

It would be nice to also have request metrics for both k8s and consul requests, maybe as histogram(s). The error rate should capture the important information here (did the request succeed), but this would give me more granular detail that might be useful in post mortems.

On Wed, Apr 22, 2020, 9:24 PM David Yu notifications@github.com wrote:

Hi @pnovotnak https://github.com/pnovotnak thanks for your feedback! Could you give us more information on what bad conditions you are hoping to catch with metrics? Also is there information on what type of monitoring tools you are using (i.e. Prometheus or DataDog)? We have had issues like that come in the past like hashicorp/consul-helm#339 https://github.com/hashicorp/consul-helm/issues/339 that we've been keeping our eye on.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/hashicorp/consul-k8s/issues/249#issuecomment-618169069, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABLS2XLXFCU6DZAI42X6LELRN67GLANCNFSM4MOTVXKQ .

filintod commented 4 years ago

Also we got a recent issue where agent could not connect to the kube api server for some reason (it works after restart) and got continuous

ERROR: logging before flag.Parse: E0713 05:59:29.636613 6 reflector.go:205] pkg/mod/k8s.io/client-go@v8.0.0+incompatible/tools/cache/reflector.go:99: Failed to list *v1.Service: Get https://x.y.z.a:443/api/v1/namespaces/tetris/services?limit=500&resourceVersion=0: dial tcp x.y.z.a:443: connect: connection refused

It should have metrics for connections succeed/failed to kube api server