Closed matthiashanel closed 4 years ago
Sounds good to me. I think most of this work will take place in the https://github.com/openfaas/faas-netes repo in the helm chart.
Open the monitoring endpoint on the streaming server
^ A setting on the NATS YAML file
start the nats-prometheus-exporter
^ the sidecar will be a YAML file alongside the NATS YAML
enable scraping of the produced metrics
^ Prometheus config rule
One thing we may want to consider is potentially talking about retries, if we do retries in the queue-worker, would we want to expose that as a metric?
@LucasRoesler any thoughts on this?
While this change looks straight forward there is a consideration to be made.
I want to move the queue worker to jetstream. While jetstream coming out of beta is close, it is not here yet. When switching to jetsream, metric names etc.. will change. For anyone using this, I'd suggest to use them with the expectation that they are going to change.
With jetstream retries are probably easier to do as well.
Thanks for those comments, we will need to bring that into consideration when releasing the NATS metrics and provide an upgrade path or release notes.
Is the window until JetStream GA so short, that we should mark the NATS Streaming metrics as experimental/alpha so people can't get too attached?
I expect to convert the openfaas queue worker to jetstream to gather more feedback. Release follows that feedback. There is a push going on to get it out the door within a few months/couple weeks.
It's certainly too close to start depending on streaming functionalities now. Labeling it as experimental/alpha with an explanation to why would be a good thing.
I think exposing these metrics is a great idea. With respect to the changing names when moving to JetStream, I echo Alex's question/concern about timing, but also agree that if you want to do this now before JetStream then we can mark it as experimental and we can also avoid announcing that these metrics are exposed to reduce the amount of people tempted to depend on them.
@alexellis , @LucasRoesler , I have submitted the PR
As far as timing goes, Jetstream is close, but not here yet. I'd go with the experimental flag for now.
Thanks for submitting the PR, I've reviewed it and commented on next steps.
Expected Behaviour
Expose stats like pending messages to prometheus and grafana
Current Behaviour
These are currently not exported
Possible Solution
Context
This is a result of #61