Open seschi98 opened 3 months ago
That's an interesting idea. @jonatan-ivanov, what do you think?
I think this is a great idea and we can create a MeterBinder
that registers Gauge
s for every certificate chain. I think we should keep this on the chain level since we can probably uniquely identify it (assumption).
The part that concerns me a bit is that the chain can contain multiple certificates, we need to somehow aggregate the "days left" value and the validity status es of all of the certs in the chain. Also we need to face with corner cases, like what if one cert in the chain is not valid yet and another already expired (and so on)?
We also need to be able to somehow "refresh" the registered Gauge
s since the bundle can change runtime (because of reload).
So I think this is useful but could be trickier than it seems for the first sight. Btw we can also provide counters about the number of chains by status (valid/expired/soon-to-be-expire/etc.) so that people can track them on a graph and also create alerts (e.g.: soon to be expire should be 0).
Just an assumption, but maybe we could first expose the SslBundle information on metric by using counters (ex : the second solution @jonatan-ivanov provided) on this issue. And then open another issue for the "bigger" enhancement with the MeterBinder
registration ? Would allow us to provide a functionnal solution quicker and then enchancing it.
I'm not a huge open-source contributor, so please correct me if I am wrong, I'm want to learn.
The two solutions I was talking about above ("days left" and "cert count by status") are not mutually exclusive, I think we should do both.
The MeterBinder
component is "just" a place where you can register Meter
s, it makes things more structured, it does not add much to complexity. The complexity of the problem space is aggregating and re-registering Gauge
s for the "days left" values.
For "counting the certs by status" (which is still a Gauge
since the value is non-monotonic), we don't have these problems but we still should use a MeterBinder
to register them.
Okay thanks !
I did a little deep dive inside the code base, looking at how SslInfo
stores the certificate chains by the alias etc.
Correct me if I'm wrong, but, we could not aggregate all certificate for the Gauge ? Maybe only aggregate certs that are currently valid, and excludes ones that are already expired or aren't valid yet ? Why do we need them (invalid certs) inside of the gauge ?
Couldn't we just refresh the gauge on reload ?
The part that concerns me a bit is that the chain can contain multiple certificates, we need to somehow aggregate the "days left" value and the validity status es of all of the certs in the chain. Also we need to face with corner cases, like what if one cert in the chain is not valid yet and another already expired (and so on)?
This feels a little bit like aggregating the health status where we use the worst case by default. For example, if one subsystem is down and everything else is up, the overall status will be down. I think a similar assume-the-worst approach makes sense here as a certificate chain is only as good as the "worst" certificate in that chain. For corner cases where there's an expired certificate and a not-yet-valid certificate, I would considered expired to be worse than not-yet-valid so the chain should be considered as having expired. My reasoning being that an expired cert cannot be fixed without someone doing something but a not-yet-valid certificate could, potentially, be fixed just by waiting.
For reference, here's the worst case by default aggregation in the health status :
if (containsOnlyValidCertificates(certificateChain)) {
validCertificateChains.add(certificateChain);
}
else if (containsInvalidCertificate(certificateChain)) {
invalidCertificateChains.add(certificateChain);
}
As long as a chain contains 1 invalid certificate, the whole chain is added in the "invalidCertificateChains" list.
I currently have something like this:
# HELP ssl_chain_expiry_seconds SSL chain expiry
# TYPE ssl_chain_expiry_seconds gauge
ssl_chain_expiry_seconds{bundle="ssldemo",certificate="7207ee6e",chain="spring-boot-ssl-sample"} -3.15682879E8
# HELP ssl_chains
# TYPE ssl_chains gauge
ssl_chains{status="expired"} 1.0
ssl_chains{status="not-yet-valid"} 0.0
ssl_chains{status="valid"} 0.0
ssl_chains{status="will-expire-soon"} 0.0
ssl_chain_expiry_seconds
is a TimeGauge
which shows the time until expiry of the chain (the chain expires when the first certificate in it expires). The certificate
is the serial number of the certificate which expires first.
ssl_chains
is a Gauge
which counts the chains by status. The status of a chain is the "worst" status of the contained certificate, from worst to best:
EXPIRED
NOT_YET_VALID
WILL_EXPIRE_SOON
VALID
So far this is working quite nicely. However, reload could be a bit trickier. Essentially we need to remove the gauges which track chains which no longer exist and update the existing ones.
The remove*
methods on the MeterRegistry
all have @Incubating
on them - not sure if we should use them?
Nice! Can you show me the code?
I guess we could remove @Incubating
from remove
, it's there for years. It can be misused with high cardinality tags but there are parts in Micrometer where we use it for similar purposes (e.g.: instrumentation for Kafka metrics).
/cc @shakuzen
Sure. I'll work a bit on it today, and then link the branch here.
It sounds like MultiGauge
could take care of removing for you.
@jonatan-ivanov: Code is here: https://github.com/mhalbritter/spring-boot/tree/mh/42030-expose-sslbundle-information-via-actuator-metrics
I've used the SampleTomcatSslApplication
to test it in action: There's a background thread which fiddles with the Sslbundles
after some delay.
I saw that support for SSL bundles was added to the actuator
info
andhealth
endpoints in https://github.com/spring-projects/spring-boot/pull/41205 and I think it would be really helpful to make that information available in themetrics
endpoint as well. I would like to utilize this enhancement to set up an alarm in my monitoring software so that I can renew my certificates before they expire.I would imagine something like this: