sensu-plugins / sensu-plugins-consul

This plugin provides native instrumentation for monitoring Consul, including: Consul server service and cluster health, and querying the Consul API to check for passing/critical services.
http://sensu-plugins.io
MIT License
13 stars 25 forks source link

Add checks to watch for impending loss of quorum and stale peer nodes #35

Closed hartmantis closed 6 years ago

hartmantis commented 6 years ago

Pull Request Checklist

Is this in reference to an existing issue?

It is not.

General

New Plugins

Live tests (done on a cluster with five live servers, one failed server, and one stale peer):

$ check-consul-quorum.rb
ConsulQuorumStatus WARNING: Cluster has 5/7 servers alive and can lose 1 more without losing quorum

$ check-consul-quorum.rb -W 2 -C 1
ConsulQuorumStatus CRITICAL: Cluster has 5/7 servers alive and can lose 1 more without losing quorum

$ check-consul-quorum.rb -W 0 -C 0
ConsulQuorumStatus OK: Cluster has 5/7 servers alive and can lose 1 more without losing quorum

$ check-consul-stale-peers.rb
ConsulStalePeers CRITICAL: Cluster contains 1 stale peer

$ check-consul-stale-peers.rb -W 1 -C 2
ConsulStalePeers WARNING: Cluster contains 1 stale peer

$ check-consul-stale-peers.rb -W 2 -C 3
ConsulStalePeers OK: Cluster contains 1 stale peer

Purpose

We recently had to go through a Consul outage recovery on a cluster that, it turned out, had been slowly accumulating stale peers in its raft configuration until it no longer had enough for quorum and died.

The check-consul-stale-peers check would examine the raft config for peers that have gone stale ("(unknown)") while the check-consul-quorum check would monitor how many servers a cluster can lose while still maintaining quorum.

Known Compatibility Issues

N/A

majormoses commented 6 years ago

Thanks for your contribution to Sensu plugins! Without people like you submitting PRs we couldn't run the project. I will review it shortly.

hartmantis commented 6 years ago

The latest refactor splits out all the shared logic for these two new checks into a "base" library. I believe it covers all the previous comments, but let me know if I missed something. Thanks!

majormoses commented 6 years ago

released: https://rubygems.org/gems/sensu-plugins-consul/versions/2.1.0