Open pnovotnak opened 4 years ago
Hi @pnovotnak thanks for your feedback! Could you give us more information on what bad conditions you are hoping to catch with metrics? Also is there information on what type of monitoring tools you are using (i.e. Prometheus or DataDog)? We have had issues like that come in the past like https://github.com/hashicorp/consul-helm/issues/339 that we've been keeping our eye on.
We use prometheus, which is the standard in k8s environments. Having an endpoint which published prom metrics would allow us to have metrics just by having an annotation present on the pod.
I would like at bare minimum error and success counters, with each increment representing a fetch & update of a single service. Using that I can calculate the error rate as a percentage.
It would be nice to also have request metrics for both k8s and consul requests, maybe as histogram(s). The error rate should capture the important information here (did the request succeed), but this would give me more granular detail that might be useful in post mortems.
On Wed, Apr 22, 2020, 9:24 PM David Yu notifications@github.com wrote:
Hi @pnovotnak https://github.com/pnovotnak thanks for your feedback! Could you give us more information on what bad conditions you are hoping to catch with metrics? Also is there information on what type of monitoring tools you are using (i.e. Prometheus or DataDog)? We have had issues like that come in the past like hashicorp/consul-helm#339 https://github.com/hashicorp/consul-helm/issues/339 that we've been keeping our eye on.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/hashicorp/consul-k8s/issues/249#issuecomment-618169069, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABLS2XLXFCU6DZAI42X6LELRN67GLANCNFSM4MOTVXKQ .
Also we got a recent issue where agent could not connect to the kube api server for some reason (it works after restart) and got continuous
ERROR: logging before flag.Parse: E0713 05:59:29.636613 6 reflector.go:205] pkg/mod/k8s.io/client-go@v8.0.0+incompatible/tools/cache/reflector.go:99: Failed to list *v1.Service: Get https://x.y.z.a:443/api/v1/namespaces/tetris/services?limit=500&resourceVersion=0: dial tcp x.y.z.a:443: connect: connection refused
It should have metrics for connections succeed/failed to kube api server
Community Note
If you are interested in working on this issue or have submitted a pull request, please leave a comment
Description
consul-k8s
components should publish metrics so that people can monitor and alert on bad conditions.