madrisan / hashicorp-vault-monitor

:key: HashiCorp Vault Monitoring Tool
Mozilla Public License 2.0
24 stars 4 forks source link

Nagios output state is "unknown" on "Connection Refused" e.g. stopped service #15

Open s256 opened 3 years ago

s256 commented 3 years ago

Hi, thanks for the great work! I came across your project while setting up a vault cluster. We are using nagios and want to monitor every aspect of the cluster. So while setting up the checks, I saw that a "connection refused" is leading to the UNKNOWN status in the nagios output.

 > service vault stop
> VAULT_CACERT=/etc/vault.d/vault-chain.cert.pem /usr/local/share/icinga/plugins/check_vault status -output=nagios
> vault UNDEFINED - error checking seal status: Get "https://127.0.0.1:8200/v1/sys/seal-status": dial tcp 127.0.0.1:8200: connect: connection refused

One can argue if one want's this to be critical, as this could simply mean a firewall is wrongly configured, while the cluster is healthy. Although I have no experience in GO I am happy to look into it and hand in a MR, but I wanted to discuss the topic first, if you and possibly others even want this to be critical.

I would otherwise simply fork your project and adapt for my needs. But it's obviously easier to work together.

madrisan commented 3 years ago

Hi, so in short you think an UNKNOWN exit state (that can be configured to generate an alert in Nagios) when the server does not respond is not the best choice. Maybe but the plugin should exit with a CRITICAL message for check_vault status only? What about adding an extra cmd-line option --unknown-as-critical that will grant backward compatibility and will give the choice of the desired behaviour to the final user?

And do not forget to add a star to the project if you find it useful :)

s256 commented 3 years ago

Hi @madrisan, yes correct. In our setup there is no "unknown". If the plugin would respond with CRITICAL for check_vault status that would fit our needs. Adding an extra cmd-line parameter would be fine. And sorry I didn't mention we renamed the binary to check_vault.

I'll see if I can hack something together myself. First steps in Go :)