Open michael-burt opened 3 days ago
Interesting idea, @michael-burt. A few comments:
- host: https://example.com
- the thing on the righthand side isn't a host there, it's a url....so probably better to change the key to url
. Given that, I'd also change tlscheck.host
to tlscheck.url
.
Another challenge is that x509 certs typically have two dates -- notBefore and notAfter (and maybe even more are possible?). I assume that most people care about the notAfter most of the time, but it's something to keep in mind. Consider updating the description to make this very explicit. I wouldn't bother with notBefore until someone claims a real need for it.
I think modeling this as a gauge counter representing the number of seconds until expiry (notAfter) seems more useful, rather than reporting the same constant datapoint value (epoch seconds) for the lifespan of a cert. Assuming a typical use case is to trigger an alert when there are fewer than 30 days left or whatever, that's much easier to do with a "time remaining" metric because you don't have to compute anything. This also makes it similar to the hardware/battery "time_left" metric as prior art (reference).
Thanks @breedx-splk , I made those updates. Do you have any suggestions for attribute names, or are these sufficient?
tlscheck.x509_isvalid
tlscheck.x509_issuer
tlscheck.x509_subject
Hey @michael-burt ,
This seems like a cool idea, my only nit is to make tlscheck.x509_isvalid
a seperate metric instead of being an attribute on the metric. It would make it easier in metric backends to alert on invalid certs.
This looks like a worthy component. I can sponsor it.
This seems like a cool idea, my only nit is to make
tlscheck.x509_isvalid
a seperate metric instead of being an attribute on the metric. It would make it easier in metric backends to alert on invalid certs.
Hi @MovieStoreGuy, I considered putting it in a metric but I couldn't really find examples of unitless boolean values being emitted as metrics within contrib except for this one in k8sclusterreceiver.
It makes me think that it goes against convention to have boolean metrics.
I see the point, it is also not something that is a hard blocker for this. I am more thinking of being lazy with the queries instead of matching against the attributes sent.
I considered putting it in a metric but I couldn't really find examples of unitless boolean values being emitted as metrics within contrib except for this one in k8sclusterreceiver.
It makes me think that it goes against convention to have boolean metrics.
Determining the validity of a X509 certificate would be good use case for the new EntityState Events, with x509 certificate being the Entity and is_valid
being the Entity Attribute. (Although I can see immediate problems with the semantic definition of is_valid
here already).
Having said that, the attribute name for tlscheck.x509_isvalid
should probably be changed, as this term is ambiguous. Both, path validation and validity period are terms that are used in RFC5280 and they are both distinct.
So, maybe a more explicit tlscheck.x509_pathvalid
would be better in order to avoid end-user confusion?
Thanks for the good feedback @cwegener.
If the certificate is expired,
tlscheck.time_left
will be0
and thetlscheck.x509_isvalid
attribute will befalse
.If the date and time specified in the
NotBefore
field of the x.509 certificate has not yet occurred,tlscheck.time_left
will be reported as a positive integer andtlscheck.x509_isvalid
attribute will befalse
.
I will change these docs and add a tlscheck.x509_periodvalid
as well to monitor the case when NotBefore
is in the future but the chain is valid.
I will change these docs and add a tlscheck.x509_periodvalid as well to monitor the case when NotBefore is in the future but the chain is valid.
I think that the attribute that describes whether or not the certificate was within its validity period at the time when the check was performed could simply just be one single attribute (e.g. the tlscheck.x509_periodvalid
) and this attribute would be true as long as NotBefore < Time.now() < NotAfter
is true. And in all other cases tlscheck.x509_periodvalid
would be false
.
And additionaly there could be another attribute tlscheck.x509_pathvalid
which stores results of path validation.
And then tlscheck.x509_isvalid
would be the result of tlscheck.x509_periodvalid && tlscheck.x509_pathvalid
Something else to keep in mind for the path validation is the CA trust store (root CA store). Because any sort of path validation also requires to agree on which CA trust store to be used. But an initial implementation could just be limited to the default CA trust store (system store) as decided by crypto/x509
https://pkg.go.dev/crypto/x509#VerifyOptions
Lastly, on the definition of x509_isvalid
:
Technically, this would be the combination of these checks:
And some users may want to have config options for any of these checks to be included in the calculation of the final overall 'is valid' attribute value.
Also, I do wonder if it might be worth building an attribute hierarchy like this, because in the future, the tlsccheckreceiver might not just be limited to report x509
related attributes but also tls
related attributes.
+ tlscheck
|\
| + x509
| \
| + is_valid
| |
| + periodvalid
| |
| + pathvalid
\
+ tls
\
+ proto_version
The initial implementation can still be limited to just x509
functionality, but with this naming hierarchy in place, it will be smoother to add new attributes into the hierarchy in the future without breaking the existing attributes.
Other than now() < NotAfter
, any other form of certificate verification is beyond what I need for my use case. Maybe it would be best to leave out x509_periodvalid
and x509_pathvalid
for the first iteration of this component since there are other considerations when reporting certificate validity.
I agree about the attribute hierarchy, I can change the name of the attributes to x509.subject
x509.cn
or x509.commonname
and x509.issuer
.
The purpose and use-cases of the new component
This purpose of the
tlscheckreceiver
is to provide expiry information about x.509 certificates. Monitoring the expiration of certificates is not achievable via any existing OpenTelemetry Collector receivers. This receiver would also check the validity of the x.509 certificate chain and emit this information as an attribute.Example configuration for the component
Telemetry data types supported
metrics
Is this a vendor-specific component?
Code Owner(s)
@michael-burt
Sponsor (optional)
@atoulme
Additional context
A draft
metadata.yaml