openfaas / nats-queue-worker

Queue-worker for OpenFaaS with NATS Streaming
https://docs.openfaas.com/reference/async/
MIT License
129 stars 59 forks source link

Expose nats monitoring endpoints to prometheus #99

Closed matthiashanel closed 4 years ago

matthiashanel commented 4 years ago

Expected Behaviour

Expose stats like pending messages to prometheus and grafana

Current Behaviour

These are currently not exported

Possible Solution

Context

This is a result of #61

alexellis commented 4 years ago

Sounds good to me. I think most of this work will take place in the https://github.com/openfaas/faas-netes repo in the helm chart.

Open the monitoring endpoint on the streaming server

^ A setting on the NATS YAML file

start the nats-prometheus-exporter

^ the sidecar will be a YAML file alongside the NATS YAML

enable scraping of the produced metrics

^ Prometheus config rule

One thing we may want to consider is potentially talking about retries, if we do retries in the queue-worker, would we want to expose that as a metric?

alexellis commented 4 years ago

@LucasRoesler any thoughts on this?

matthiashanel commented 4 years ago

While this change looks straight forward there is a consideration to be made.

I want to move the queue worker to jetstream. While jetstream coming out of beta is close, it is not here yet. When switching to jetsream, metric names etc.. will change. For anyone using this, I'd suggest to use them with the expectation that they are going to change.

With jetstream retries are probably easier to do as well.

alexellis commented 4 years ago

Thanks for those comments, we will need to bring that into consideration when releasing the NATS metrics and provide an upgrade path or release notes.

Is the window until JetStream GA so short, that we should mark the NATS Streaming metrics as experimental/alpha so people can't get too attached?

matthiashanel commented 4 years ago

I expect to convert the openfaas queue worker to jetstream to gather more feedback. Release follows that feedback. There is a push going on to get it out the door within a few months/couple weeks.

It's certainly too close to start depending on streaming functionalities now. Labeling it as experimental/alpha with an explanation to why would be a good thing.

LucasRoesler commented 4 years ago

I think exposing these metrics is a great idea. With respect to the changing names when moving to JetStream, I echo Alex's question/concern about timing, but also agree that if you want to do this now before JetStream then we can mark it as experimental and we can also avoid announcing that these metrics are exposed to reduce the amount of people tempted to depend on them.

matthiashanel commented 4 years ago

@alexellis , @LucasRoesler , I have submitted the PR

As far as timing goes, Jetstream is close, but not here yet. I'd go with the experimental flag for now.

alexellis commented 4 years ago

Thanks for submitting the PR, I've reviewed it and commented on next steps.