ribbybibby / ssl_exporter

Exports Prometheus metrics for TLS certificates
Apache License 2.0
520 stars 97 forks source link

Export certificate metadata as labels rather than separate metrics #18

Closed ribbybibby closed 4 years ago

ribbybibby commented 4 years ago

tl;dr I should remove all of the 'informational' certificate metrics and attach that data to ssl_cert_not_after and ssl_cert_not_before as labels. Detailed explanation follows.

This will be a breaking change, so will form part of a 1.0.0 release.


When I first created this exporter over 2 years ago I was fairly new to Prometheus and I didn't really understand, or hadn't thought much about, what made a good metric. I had seen other exporters which used separate metrics for metadata and blindly followed that approach.

However, I don't think the reasons those other exporters put metadata fields into their own metrics apply to certificates.

Typically you would put a piece of metadata, like a consul tag, into its own metric because a consul tag can have any value and any number of values and those values are likely to change over time. If you were putting all the tags for a consul service into a label on each consul service metric then the number of series you were storing for any given metric would double every time a tag was added or removed. With a separate metric, you would get one extra series per new tag.

However, certificates are different from consul services because the information attached to a certificate (like common name) never changes. No matter how many labels you attach to ssl_cert_not_after, or what those labels represent, you will get the same number of series. Therefore, there's no benefit to putting these values in different metrics.

In fact, as it stands, the exporter exports way more metrics than it would if I had chosen to use labels. At the moment it exports 7 metrics for each unique instance+certificate combination. So, if you have 10 certificates, that means you have 70 series overall. But, if all of the metadata metrics were labels that would be 2 metrics, and therefore only 20 series overall.

Furthermore, having a separate metric for common name, and sans, and ou's just makes querying the metrics harder than it needs to be. Compare this query:

((ssl_cert_not_after - time() < 86400 * 30) * on (instance,issuer_cn,serial_no) group_left (dnsnames) ssl_cert_subject_alternative_dnsnames{dnsnames=~".*,.*example.org,.*"}) * on (instance,issuer_cn,serial_no) group_left (subject_cn) ssl_cert_subject_common_name{subject_cn=~"^.*example.org"}

To this one:

ssl_cert_not_after{dnsnames=~".*,.*example.org,.*",subject_cn=~"^.*example.org"}  - time() < 86400 * 30

The latter is clearly more understandable and a lot less work to put together.

the-maldridge commented 4 years ago

I've played with this some now over with Void Linux where I'm needing to monitor all the certs around the fleet. I've found this to be one of the weirdest exporters I've ever used due to the many metrics. I'd be happy to help out any way I can with updating to the newer format of metrics.

ribbybibby commented 4 years ago

Hi @the-maldridge, thanks for the offer to help! Here's an RC candidate with the label changes: https://github.com/ribbybibby/ssl_exporter/releases/tag/v1.0.0-rc.0. Could you give it a spin and provide any feedback?

the-maldridge commented 4 years ago

Will do. I've got an environment where I have some folks that would also be interested in taking a look if you have a docker build of that handy. I'll get folks to take a look tomorrow and see what I can report back.

ribbybibby commented 4 years ago

Yep: ribbybibby/ssl-exporter:v1.0.0-rc.0.

the-maldridge commented 4 years ago

I installed version 1.0.0-rc.0 today and it worked pretty well, the only thing so far that threw me is I didn't expect to get back metrics for the entire chain of trust up to the roots, though this may have just been me not understanding something fully.

ribbybibby commented 4 years ago

Glad to hear it worked for you, @the-maldridge.

Yes, the presence of the entire chain in the metrics seems strange to the vast majority of people using the exporter who are only really interested in the end-user certificate. I included them for a few reasons:

  1. There's no downside to doing it, as far as I can tell.
  2. I believe it's technically possible, although extremely unlikely, that a certificate further up the chain could expire before the end-user certificate. This would result in a lack of trust, so it's a useful thing to be able to alert on.
  3. There may be use-cases or applications that could benefit from having metrics related to the root CA. One case I thought of was being able to quickly tell if you have a certificate in any of your chains which are due to be distrusted by one of the major browsers.
the-maldridge commented 4 years ago

While 2 is possible, I believe it would be in direct violation of the CA/B guidelines for certificate issuance, so any certificate that that happens to would be considered miss-issued already.

3 is a good point, I hadn't thought of that as my organization uses other tools for root validation.

ribbybibby commented 4 years ago

Released in v1.0.0