projectcontour / contour

Contour is a Kubernetes ingress controller using Envoy proxy.
https://projectcontour.io
Apache License 2.0
3.72k stars 677 forks source link

Contour should expose more metrics about gRPC serving #1678

Open youngnick opened 5 years ago

youngnick commented 5 years ago

As part of troubleshooting #1523, @FournierAlexandre and @bgagnon have been working on observing their Contour and Envoy installation, and have found that it would be really useful to have more metrics around Contour's gRPC serving.

This issue covers adding some more metrics to Contour to expose more information about what's happening with the connections to Envoy via gRPC.

Specifically, the main deliverables are:

bgagnon commented 5 years ago

Thanks @youngnick!

The first low-hanging fruit I see is to integrate the grpc-ecosystem/go-grpc-prometheus middleware into Contour.

There is very little code to write to instrument the server: https://github.com/grpc-ecosystem/go-grpc-prometheus#server-side. The list of standard timeseries is pretty good out of the box.

One important thing that is missing, however, is a Gauge to represent the number of connected clients (ie. number of Envoys). There is an open issue for exactly that: https://github.com/grpc-ecosystem/go-grpc-prometheus/issues/78, but it could be done in Contour instead (or in addition to). Doing it in Contour gives extra flexibility in terms of labels.

youngnick commented 5 years ago

Thanks again for the PR, @bgagnon! I'm checking that one out, and once we've landed it, I'll have a look at adding the number of connected clients.

youngnick commented 5 years ago

With the merging of #1692, we have basic stats about the connected clients. I don't think we're going to be able to land the number of connected clients in rc2, sadly, and we don't want to add any new functionality in the final 1.0.0. So, I'm moving this to the backlog for now. Please note that 'backlog' currently means 'after 1.0', not 'will never get done'. I think this is a really important and not that large feature, so I will make sure it doesn't disappear.