Open JDA88 opened 1 year ago
I am not able to reproduce this. With insecure_skip_verify: true
, a probe against https://expired.badssl.com returns:
# HELP probe_dns_lookup_time_seconds Returns the time taken for probe dns lookup in seconds
# TYPE probe_dns_lookup_time_seconds gauge
probe_dns_lookup_time_seconds 5.000845812
# HELP probe_duration_seconds Returns how long the probe took to complete in seconds
# TYPE probe_duration_seconds gauge
probe_duration_seconds 5.6184539430000005
# HELP probe_failed_due_to_regex Indicates if probe failed due to regex
# TYPE probe_failed_due_to_regex gauge
probe_failed_due_to_regex 0
# HELP probe_http_content_length Length of http content response
# TYPE probe_http_content_length gauge
probe_http_content_length 494
# HELP probe_http_duration_seconds Duration of http request by phase, summed over all redirects
# TYPE probe_http_duration_seconds gauge
probe_http_duration_seconds{phase="connect"} 0.13220203
probe_http_duration_seconds{phase="processing"} 0.204624022
probe_http_duration_seconds{phase="resolve"} 5.000845812
probe_http_duration_seconds{phase="tls"} 0.280287724
probe_http_duration_seconds{phase="transfer"} 0.00013658
# HELP probe_http_last_modified_timestamp_seconds Returns the Last-Modified HTTP response header in unixtime
# TYPE probe_http_last_modified_timestamp_seconds gauge
probe_http_last_modified_timestamp_seconds 1.689976823e+09
# HELP probe_http_redirects The number of redirects
# TYPE probe_http_redirects gauge
probe_http_redirects 0
# HELP probe_http_ssl Indicates if SSL was used for the final redirect
# TYPE probe_http_ssl gauge
probe_http_ssl 1
# HELP probe_http_status_code Response HTTP status code
# TYPE probe_http_status_code gauge
probe_http_status_code 200
# HELP probe_http_uncompressed_body_length Length of uncompressed response body
# TYPE probe_http_uncompressed_body_length gauge
probe_http_uncompressed_body_length 494
# HELP probe_http_version Returns the version of HTTP of the probe response
# TYPE probe_http_version gauge
probe_http_version 1.1
# HELP probe_ip_addr_hash Specifies the hash of IP address. It's useful to detect if the IP address changes.
# TYPE probe_ip_addr_hash gauge
probe_ip_addr_hash 1.56181497e+08
# HELP probe_ip_protocol Specifies whether probe ip protocol is IP4 or IP6
# TYPE probe_ip_protocol gauge
probe_ip_protocol 4
# HELP probe_ssl_earliest_cert_expiry Returns last SSL chain expiry in unixtime
# TYPE probe_ssl_earliest_cert_expiry gauge
probe_ssl_earliest_cert_expiry 1.428883199e+09
# HELP probe_ssl_last_chain_expiry_timestamp_seconds Returns last SSL chain expiry in timestamp
# TYPE probe_ssl_last_chain_expiry_timestamp_seconds gauge
probe_ssl_last_chain_expiry_timestamp_seconds -6.21355968e+10
# HELP probe_ssl_last_chain_info Contains SSL leaf certificate information
# TYPE probe_ssl_last_chain_info gauge
probe_ssl_last_chain_info{fingerprint_sha256="ba105ce02bac76888ecee47cd4eb7941653e9ac993b61b2eb3dcc82014d21b4f",issuer="CN=COMODO RSA Domain Validation Secure Server CA,O=COMODO CA Limited,L=Salford,ST=Greater Manchester,C=GB",subject="CN=*.badssl.com,OU=Domain Control Validated+OU=PositiveSSL Wildcard",subjectalternative="*.badssl.com,badssl.com"} 1
# HELP probe_success Displays whether or not the probe was a success
# TYPE probe_success gauge
probe_success 1
# HELP probe_tls_version_info Returns the TLS version used or NaN when unknown
# TYPE probe_tls_version_info gauge
probe_tls_version_info{version="TLS 1.2"} 1
Logs:
ts=2023-09-11T02:28:56.857917859Z caller=main.go:181 module=http_2xx target=https://expired.badssl.com level=info msg="Beginning probe" probe=http timeout_seconds=119.5
ts=2023-09-11T02:28:56.857991462Z caller=http.go:328 module=http_2xx target=https://expired.badssl.com level=info msg="Resolving target address" target=expired.badssl.com ip_protocol=ip4
ts=2023-09-11T02:29:01.858742185Z caller=http.go:328 module=http_2xx target=https://expired.badssl.com level=info msg="Resolved target address" target=expired.badssl.com ip=104.154.89.105
ts=2023-09-11T02:29:01.858916072Z caller=client.go:260 module=http_2xx target=https://expired.badssl.com level=info msg="Making HTTP request" url=https://104.154.89.105 host=expired.badssl.com
ts=2023-09-11T02:29:02.476254806Z caller=handler.go:120 module=http_2xx target=https://expired.badssl.com level=info msg="Received HTTP response" status_code=200
ts=2023-09-11T02:29:02.47637202Z caller=handler.go:120 module=http_2xx target=https://expired.badssl.com level=info msg="Response timings for roundtrip" roundtrip=0 start=2023-09-11T04:29:01.859041909+02:00 dnsDone=2023-09-11T04:29:01.859041909+02:00 connectDone=2023-09-11T04:29:01.991243924+02:00 gotConn=2023-09-11T04:29:02.271599196+02:00 responseStart=2023-09-11T04:29:02.476223259+02:00 tlsStart=2023-09-11T04:29:01.991292589+02:00 tlsDone=2023-09-11T04:29:02.271580327+02:00 end=2023-09-11T04:29:02.476359843+02:00
ts=2023-09-11T02:29:02.476410357Z caller=main.go:181 module=http_2xx target=https://expired.badssl.com level=info msg="Probe succeeded" duration_seconds=5.6184539430000005
Case 1: insecure_skip_verify: true
on https://expired.badssl.com/
# HELP probe_ssl_earliest_cert_expiry Returns last SSL chain expiry in unixtime
# TYPE probe_ssl_earliest_cert_expiry gauge
probe_ssl_earliest_cert_expiry 1.428883199e+09
# HELP probe_ssl_last_chain_expiry_timestamp_seconds Returns last SSL chain expiry in timestamp
# TYPE probe_ssl_last_chain_expiry_timestamp_seconds gauge
probe_ssl_last_chain_expiry_timestamp_seconds -6.21355968e+10
# HELP probe_success Displays whether or not the probe was a success
# TYPE probe_success gauge
probe_success 1
Case 2: insecure_skip_verify: false
on https://expired.badssl.com/ :
# HELP probe_success Displays whether or not the probe was a success
# TYPE probe_success gauge
probe_success 0
Case 3: insecure_skip_verify: true
on https://google.com/ :
# HELP probe_ssl_earliest_cert_expiry Returns last SSL chain expiry in unixtime
# TYPE probe_ssl_earliest_cert_expiry gauge
probe_ssl_earliest_cert_expiry 1.695438759e+09
# HELP probe_ssl_last_chain_expiry_timestamp_seconds Returns last SSL chain expiry in timestamp
# TYPE probe_ssl_last_chain_expiry_timestamp_seconds gauge
probe_ssl_last_chain_expiry_timestamp_seconds -6.21355968e+10
# HELP probe_success Displays whether or not the probe was a success
# TYPE probe_success gauge
probe_success 1
Case 4: insecure_skip_verify: false
on https://google.com/ :
# HELP probe_ssl_earliest_cert_expiry Returns last SSL chain expiry in unixtime
# TYPE probe_ssl_earliest_cert_expiry gauge
probe_ssl_earliest_cert_expiry 1.695438759e+09
# HELP probe_ssl_last_chain_expiry_timestamp_seconds Returns last SSL chain expiry in timestamp
# TYPE probe_ssl_last_chain_expiry_timestamp_seconds gauge
probe_ssl_last_chain_expiry_timestamp_seconds 1.695438759e+09
# HELP probe_success Displays whether or not the probe was a success
# TYPE probe_success gauge
probe_success 1
The issue with insecure_skip_verify
is that as soon as you put it to true the timestamp is always negative and you can't have an alarm with the date.
Currently we have 2 options:
insecure_skip_verify: false
You can have an alarm before it expirate, but as soon as it's expired you loose the alarm (and you can't have an alams on new target already expired)insecure_skip_verify: true
You can have an alarm on expired certificate but you cant warn x days before.To have both alarms working you have to query the same target with two modules
IMO timestamp should never be negative if the certificate is present.
insecure_skip_verify
will have negligible effect on a target that has well-maintained and up to date certificate chain (such as google.com), so cases 3 & 4 are obvious and to be expected.
Case 2 is also to be expected, since a tls.Config with InsecureSkipVerify
false (i.e., the default) will return an error during a TLS handshake, causing blackbox_exporter's probe to fail (and thus the usual metrics will be missing).
However, your case 1 disagrees with the results of my test, which as you can see in my previous comment, had probe_success 1
. An alerting rule which checked for probe_success == 1
and probe_ssl_earliest_cert_expiry < time()
would achieve your goal of differentiating a site not responding from a site with an expired certificate.
If you append &debug=true
to your probe, it will shed more light on why it considers the probe failed.
Incidentally, the probe_ssl_last_chain_expiry_timestamp_seconds
metric will be meaningless on probes with insecure_skip_verify: true
, since it is derived from the tls.ConnectionState slice of VerifiedChains
, which is empty if Config.InsecureSkipVerify
is true.
In such cases, the probe_ssl_last_chain_expiry_timestamp_seconds
is set to the Unix epoch value of an uninitialised time.Time{}
(i.e., time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC)
), which is -62135596800.
In contrast, the probe_ssl_earliest_cert_expiry
metric is derived from the PeerCertificates
slice in the tls.ConnectionState, which is unaffected by InsecureSkipVerify
. blackbox_exporter simply ranges over these and returns the earliest "NotAfter" date. Technically, this may not be the target's certificate, as it will be the earliest expiry date of any certificate which is offered by the target during the TLS handshake. For example, if the target sends a chain containing intermediate CAs, and one of those CAs expires before the target certificate itself, then probe_ssl_earliest_cert_expiry
will be set accordingly. This is usually what you want, since the expiration of any certificate in the chain would cause a TLS handshake failure.
You are right, my bad, Case 1 was returning a 403 because our transparent proxy blocked the website from the server subnet (Access denied due to bad server certificate)
So, to summarize: | insecure_skip_verify | Certificate status | probe_success | SSL timestamp metrics | Can detect certificate expiring soon | Can detect certificate expired |
---|---|---|---|---|---|---|
true | Valid | true | -62135596800 | No | No | |
true | Expired | true | -62135596800 | No | No | |
false | Valid | true | Expiration date | Query 1 | ? | |
false | Expired | false | Missing | Query 1 | ? |
Query 1:
(probe_ssl_last_chain_expiry_timestamp_seconds{} - time()) < (86400 * 15)
To detect a certificate expiring soon insecure_skip_verify: false
is required.
So in order to detect a certificate already expired I need a query with:
probe_success == 0
probe_ssl_last_chain_expiry_timestamp_seconds
is missingI can't find a way to make this work, my promql level is not good enough
To detect a certificate expiring soon, it does not matter what insecure_skip_verify
is set to. Setting insecure_skip_verify: true
is only necessary when probing a target whose certificate has already expired, since the TLS handshake would otherwise fail, causing the entire probe to fail.
To detect a certificate expiring soon, it does not matter what
insecure_skip_verify
is set to. Settinginsecure_skip_verify: true
is only necessary when probing a target whose certificate has already expired, since the TLS handshake would otherwise fail, causing the entire probe to fail.
I know, but having insecure_skip_verify: true
is useless because once set to true there is no way to tell the difference between a certificate expired or not and this is my main issue
I know, but having insecure_skip_verify: true is useless because once set to true there is no way to tell the difference between a certificate expired or not and this is my main issue
probe_success
is a pretty broad metric, but assuming that the only issue is that the TLS verification fails (which may be for reasons other than certificate expiration), then the probe_ssl_earliest_cert_expiry
is usable even when insecure_skip_verify
is set to true.
In the past I have simply used probe_ssl_earliest_cert_expiry - time() < 14 * 86400
to alert when a certificate has less than 14 days before expiring.
If you want to also be able to determine the actual expiry date of a target which has already expired, then you will of course need to probe using a blackbox module with insecure_skip_verify: true
. However, due to the fact that most TLS clients will refuse to connect to an expired peer anyway, the probe_success 0
is arguably sufficient enough to call attention to such a scenario.
As I already explained, probe_ssl_last_chain_expiry_timestamp_seconds
is useless if insecure_skip_verify
is set to true. This is due to the internal implementation details of the tls package in Go.
The crux is that you really need to alert for (and resolve) expiring certificates before they expire. Once they have expired, probe_success
will be zero, and if you are probing using a module with insecure_skip_verify: false
(i.e., the default and recommended setting), then there isn't really any other useful metric that indicates why the probe failed.
If you know that you have targets whose certificates have already expired, then your only real option is to probe them with insecure_skip_verify: true
, and use the probe_ssl_earliest_cert_expiry
metric.
The crux is that you really need to alert for (and resolve) expiring certificates before they expire.
100% agree. Unfortunately, on the real life (and with internal certificates) it’s not uncommon to have a certificate expired. And with the current implementation the classic probe_ssl_earliest_cert_expiry - time() < x
alert disappear as soon as the certificate expire and a new probe down
appear wich is sub optimal for alert tracking.
I still think we could use a new probe_failed_due_to_expired_certificate
metric for those cases
Host operating system: output of
uname -a
Windows
blackbox_exporter version: output of
blackbox_exporter --version
0.24.0
What is the blackbox.yml module config.
What did you do that produced an error?
No matter what
insecure_skip_verify
option I use if the target certificate is expired the ssl related metrics are not present,probe_ssl_last_chain_expiry_timestamp_seconds
etc.What did you expect to see?
A way to differentiate a site not responding from a site with an expired certificate.
What did you see instead?
I have an alarm for certificate that will expire “soon” but as soon as the certificate is expired it doesn’t work anymore.
I don’t mind the probe failing on expired certificate, but when the probe is failing with
tls: failed to verify certificate: x509: certificate has expired or is not yet valid
the metrics with the time stamps should still be present. That will allow to have alerts with message like “certificate expires x days ago”Another option could also be a new
probe_failed_due_to_expired_certificate
metric