Open ahus1 opened 1 month ago
/cc @jmartisk (health), @xstefank (health)
@cescoffier since you were doing some tls lately. We have an api to register health checks, it's just a matter of providing correct info.
That should be easy to implement except (you saw it coming right?) in one case: SNI.
When you use SNI, you provide multiple certificates. Do we want to check that all of them are valid (maybe some will be invalid but never requested because no one is using the associated hostname). For metrics we can use tags to identify the hostnames, for else I 'm not too sure.
Note that there is already an expiration check at startup, we can reuse the same code.
If you point me to it, I can give it a shot.
https://github.com/quarkusio/quarkus/blob/main/extensions/tls-registry/runtime/src/main/java/io/quarkus/tls/runtime/CertificateRecorder.java#L105 is the method orchestrating the validation.
Description
A TLS certificate for the HTTPS port needs to be renewed regularly. It would be good to be able to monitor this in Quarkus. This would spot problems when those are not rotated for some reason (either automation failed, or the manual process was forgotten, or certificate reloading didn't work as expected).
Implementation ideas
Add metrics for monitoring to get an alert before the cert expires. I suggest to add the timestamp-in-seconds for when the certificate was issued and when the certificate would expire. As the value of the metric wouldn't change unless the certificate changes, this rarely changes, is simple to monitor when comparing it with the current time, and modern monitoring system will be able to store this time series efficiently.
Add a health check that verifies that the that the current time is between issue and expiry date. Might be possible to implement as an async check. Should not be a
@Liveness
probe as it shouldn't restart the service. Could possibly be a@Readiness
probe or@Startup
probe based on what we come up in the discussions in this issue.