timdaman / check_docker

Nagios plugin to check docker containers
GNU General Public License v3.0
152 stars 60 forks source link

Add capability to alert critical when container is removed #46

Closed mattwwarren closed 6 years ago

mattwwarren commented 6 years ago

Currently, the script will throw an UNKNOWN with exit rc 3 if the container has been removed. I propose that with the --present flag, the script should throw a CRITICAL rc 2 when the container cannot be found.

Locally, I've patched the script like this:

diff --git a/ansible/roles/docker/files/check_docker b/ansible/roles/docker/files/check_docker
index 9bd8e806a..7a692199a 100644
--- a/ansible/roles/docker/files/check_docker
+++ b/ansible/roles/docker/files/check_docker
@@ -237,6 +237,7 @@ def get_containers(names, require_present):
         # If we don't find a container that matches out regex
         if require_present and not found:
             critical("No containers match {}".format(matcher))
+            raise RuntimeError("Container not found")

     return filtered

@@ -618,6 +619,7 @@ def process_args(args):
 def no_checks_present(parsed_args):
     # Look for all functions whose name starts with 'check_'
     checks = [key[6:] for key in globals().keys() if key.startswith('check_')]
+    checks.append('present')
     return all(getattr(parsed_args, check) is None for check in checks)

@@ -691,6 +693,9 @@ def perform_checks(raw_args):
                     if args.restarts:
                         check_restarts(container, *parse_thresholds(args.restarts, include_units=False))

+        except RuntimeError as e:
+            print_results()
+            exit(rc)
         except Exception as e:
             unknown("Exception raised during check: {}".format(repr(e)))

Does this seem like a reasonable way forward? I considered making a custom exception class so we can catch something more specific than RuntimeError.

mattwwarren commented 6 years ago

For example, before my patch:

check_docker --container 'my_awesome_container' --present --cpu 85 || echo "$?"
CRITICAL: No containers match my_awesome_container; UNKNOWN: No containers names found matching criteria
3

$ check_docker --container 'my_awesome_container' --present || echo "$?"
UNKNOWN: No checks specified.
3

And after:

$ check_docker --container 'my_awesome_container' --present || echo "$?"
CRITICAL: No containers match my_awesome_container
2
timdaman commented 6 years ago

Hmm, I see the issue you are mentioning.

I must confess I disagree with the use of exceptions here. The reason if that it blocks other checks. If two containers are being checked but only one of them is missing it will skip checks on the second one.

I am think there are couple changes that make sense here.

  1. Checks of "all" containers (the default if no are specified) should be exclusive of the --present flag as they don't make sense together. Not something you mentioned but something new I noticed.
  2. I think updating the --present flag to quality as a "check" makes sense. How about making a much simpler change
    
    checks = [key[6:] for key in globals().keys() if key.startswith('check_')]
    --to--
    checks = [key[6:] for key in globals().keys() if key.startswith('check_')]
    checks.append("present")  # Act like --present is a check though it is not implemented like one

--and-- if len(containers) == 0: --to-- if len(containers) == 0 and not args.present:


I think this will give you what you want but also allow functional checks to work per normal.
timdaman commented 6 years ago

2.0.7 Applied the updates I mentioned above. Closing this issue as a result.