matteocorti / check_ssl_cert

A shell script (that can be used as a Nagios/Icinga plugin) to check an SSL/TLS connection.
GNU General Public License v3.0
368 stars 132 forks source link

Prometheus / OpenMetrics output #310

Closed varac closed 3 years ago

varac commented 3 years ago

I'm using prometheus and would love to use check_ssl_cert with it. Please consider adding optional prometheus/openmetrics output.

i.e. instead of

❯ check_ssl_cert -H localhost --file cert.pem
SSL_CERT CRITICAL www.example.de: x509 certificate element 4 is expired (was valid until Mar 17 16:40:46 2021 GMT)|days_chain_elem1=89;20;15;; days_chain_elem2=1460;20;15;; days_chain_elem3=1110;20;15;; days_chain_elem4=-182;20;15;; 

output sth like this (label names could be improved and commented, I actually don't know what the different numbers stand for):

❯ check_ssl_cert -H localhost --file cert.pem  --prometheus
cert_www_example_de_days_chain_elem1_1=89
cert_www_example_de_days_chain_elem1_2=20
cert_www_example_de_days_chain_elem1_3=15
...
matteocorti commented 3 years ago

Why not. Do you just need the specified output or do you also need the Nagios status line (e.g. the line with SSL_CERT CRITICAL www.example.de: x509 certificate element 4 is expired (was valid until Mar 17 16:40:46 2021 GMT)|days_chain_elem1=89;20;15;; days_chain_elem2=1460;20;15;; days_chain_elem3=1110;20;15;; days_chain_elem4=-182;20;15;;).

Or it's enough to habe the correct exit status and the lines with the open metrics data?

varac commented 3 years ago

When exporting prometheus metrics, the interpretation is usually done by prometheus. However, it would be good to export overall status from as seen from the check script.

Here's how a more detailled output would look like:

# HELP cert_valid   If cert is ok (0), warning (1) or critical (2)
# TYPE cert_valid gauge
cert_valid{cn="www.example.de"} 2

# HELP cert_valid_chain_elem1  If chain element is ok (0), warning (1) or critical (2)
# TYPE cert_valid_chain_elem1 gauge
cert_valid_chain_elem1 0

# HELP cert_days_chain_elem1_1  ...
# TYPE cert_days_chain_elem1_1 gauge
cert_days_chain_elem1_1{cn="www.example.de"} 89

# HELP cert_days_chain_elem1_2  ...
# TYPE cert_days_chain_elem1_2 gauge
cert_days_chain_elem1_2{cn="www.example.de"} 20

# HELP cert_days_chain_elem1_3  ...
# TYPE cert_days_chain_elem1_3 gauge
cert_days_chain_elem1_3{cn="www.example.de"} 15
...
matteocorti commented 3 years ago

Ok I'll try. May I ask why you have the numbering like 1_1. 1_2 and not directly only elem1, elem2 or elem_1, elem_2?

matteocorti commented 3 years ago

Another question: do you need a valid line for each element?

varac commented 3 years ago

Oh, no I understood the nagios performance data output, the first number is the actual value, the second and the third are the critical and warning thresholds right ?

So then it's even easier:

# HELP cert_valid   If cert is ok (0), warning (1) or critical (2)
# TYPE cert_valid gauge
cert_valid{cn="www.example.de"} 2

# HELP cert_valid_chain_elem  If chain element is ok (0), warning (1) or critical (2)
# TYPE cert_valid_chain_elem gauge
cert_valid_chain_elem{cn="www.example.de", element=1} 0
cert_valid_chain_elem{cn="www.example.de", element=2} 0
cert_valid_chain_elem{cn="www.example.de", element=3} 0
cert_valid_chain_elem{cn="www.example.de", element=4} 2

# HELP cert_days_chain_elem Days until chain element expires
# TYPE cert_days_chain_elem gauge
cert_days_chain_elem{cn="www.example.de", element=1} 89
cert_days_chain_elem{cn="www.example.de", element=2} 1460
cert_days_chain_elem{cn="www.example.de", element=3} 1110
cert_days_chain_elem{cn="www.example.de", element=4} -182

...

Does this make sense ?

matteocorti commented 3 years ago

Something like that?

$ check_ssl_cert -H ethz.ch --prometheus
cert_valid{cn="ethz.ch"} 0
cert_days_chain_elem{cn="ethz.ch", element=1} 351
cert_valid_chain_elem{cn="ethz.ch", element=1} 0
cert_days_chain_elem{cn="ethz.ch", element=2} 3229
cert_valid_chain_elem{cn="ethz.ch", element=2} 0
cert_days_chain_elem{cn="ethz.ch", element=3} 7423
cert_valid_chain_elem{cn="ethz.ch", element=3} 0

or

 ./check_ssl_cert -H ethz.ch --prometheus --critical 300 --warning 400
cert_valid{cn="ethz.ch"} 1
cert_days_chain_elem{cn="ethz.ch", element=1} 351
cert_valid_chain_elem{cn="ethz.ch", element=1} 1
cert_days_chain_elem{cn="ethz.ch", element=2} 3229
cert_valid_chain_elem{cn="ethz.ch", element=2} 0
cert_days_chain_elem{cn="ethz.ch", element=3} 7423
cert_valid_chain_elem{cn="ethz.ch", element=3} 0
matteocorti commented 3 years ago

I committed a first implementation. Can you please check and see if it does what you need?

varac commented 3 years ago

Nice, I tried it and it works fine ! Awesome, you're fast :rocket:

It would be good to include the above comments (HELP, TYPE) as well, since they describe the type of the metrics as well as an explanation for prometheus when scraping.

matteocorti commented 3 years ago

I added the comments.