liftbridge-io / liftbridge

Lightweight, fault-tolerant message streams.
https://liftbridge.io
Apache License 2.0
2.56k stars 106 forks source link

Monitoring API #222

Open tylertreat opened 4 years ago

tylertreat commented 4 years ago

Provide an API that exposes monitoring information and metrics.

We'll need to think on whether this should be part of the gRPC API or a separate HTTP/REST-based API. My inclination is that HTTP is nicer for implementing integrations and can be hit directly from a web browser, curl, etc. for debugging purposes. The downside is it will require running an HTTP server on an additional port.

joe-getcouragenow commented 4 years ago

Just want to add that GRPC-Web can be used and so can be hit by a web browser. You can avoid taking on envoy proxy as a runtime dependency and just embed envoy.

https://www.getenvoy.io/

Example: https://github.com/pomerium/pomerium/blob/master/scripts/embed-envoy.bash

annismckenzie commented 4 years ago

Please export the metrics at /metrics in Prometheus format. That would be the best.

ekbfh commented 3 years ago

Hi! Just doing a PoC in a big mesh and i suffer from lack of /metrics

danthegoodman1 commented 2 years ago

looking forward to this one too :)

definitely a requirement for us to use in production

tylertreat commented 2 years ago

I plan to tackle this once consumer groups is completed. The plan at this time is to implement a /metrics endpoint in Prom format.

LaPetiteSouris commented 2 years ago

I suggest we make proposition here on what metrics should be exposed ? I think it would be nice to have an exhaustive v0 of metrics that are judged to be critical. Any ideas ?

I also think, there are 2 kind of metrics:

For a start, may be it is somehow relevant, here are the list of metrics exposed by the famous Hashicorp Nomad

tylertreat commented 2 years ago

There are probably 3 categories of metrics:

There may be others that I am missing, but this is what comes to mind for me initially. To your point, the first step should probably be determining what the minimal critical set of metrics are, then add additional ones once there is an identified need. I would prefer to start small and then build on it.

LaPetiteSouris commented 2 years ago

I suggest to even start smaller.

We can define already the code pattern that shall be used to collect and export metrics.

Each and every metrics that will be added later are basically adapters to add. And they can be added progressively and independently.

This can be processed in parallel with the discussion on the metrics. Or we can rather pick 1-2 metrics in a very arbitraged way to begin with.

danthegoodman1 commented 2 years ago

I would really appreciate ways to calculate produce/consume rates, and even more so consumer lag (time)

ekbfh commented 2 years ago

Hi! Consumer groups are good, so.. :)

I just want to add some values to export in metrics such as HW, Last Offset and Cursor counts. It really helps in investigation of some processes