portier / portier-broker

Portier Broker reference implementation, written in Rust
http://portier.github.io/
Apache License 2.0
557 stars 17 forks source link

Monitoring endpoint #137

Open stephank opened 7 years ago

stephank commented 7 years ago

It'd be nice to have an API monitoring endpoint that does a quick check of Redis and SMTP connections. We could formalise /ver.txt as this.

But I'm also guessing the thing should be optional and disabled on the public broker?

Natim commented 7 years ago

Here are the Utility endpoints that we have for Mozilla Services and especially Kinto: https://kinto.readthedocs.io/en/latest/api/1.x/utilities.html

jimdigriz commented 3 years ago

For others to discover, I have noticed that if the Redis backend drops the broken (by design) panics immediately and dies which is perfect.

For an actual status URL /.well-known/openid-configuration or if you want something with content from Redis you can use /keys.json (though the broker by design subscribes to changes rather than fetches them every time) will get you out of a pinch; especially as cruft like GCP's health checks treat an 204 response as a 'failure' so they must be 200 responses...

stephank commented 3 years ago

Considering we crash for Redis failures, and SMTP is not really a permanent connection but request-based, I'm wondering what we should actually do here?

We now have GET /metrics for gathering some numbers, but that doesn't address the original issue description.

onli commented 3 years ago

Well, I guess the intent of the original author (^^) might have been to get an impression on whether the broker is properly working. Can it fetch data, do emails go out. Does /metrics cover that for you?

jimdigriz commented 3 years ago

Agreeing with @stephank as I do not think any actual development work needs to be done here.

What is needed is official advice in the documentation on what the administrator should use when faced with adding monitoring for the service; whether that is in the form of a Nagios check, Docker HEALTHCHECK or cloud load balancer.

You just need to formally declare what the URL to use is.

I would recommend /.well-known/openid-configuration as it is a core part of portier's functionality and will never go away plus it already returns a suitable HTTP status code.

The alternative is add a dedicated checking route, or make it /metrics and pinky promise that will always return 200.