timdaman / check_docker

Nagios plugin to check docker containers
GNU General Public License v3.0
152 stars 60 forks source link

less verbose output #49

Closed rogerniesten closed 5 years ago

rogerniesten commented 6 years ago

First of all I like this service very much and want (initially) use it to check whether all docker containers are running. In case of an exited container, I want to get an alert from Nagios. In general this works fine, but the service gives the status of all containers. With 15 containers (number will grow further in near future) this is a lot of text... In case of an exited service it's quite hard to find which service is not running between all the OK messages... I would like to have an option to hide the information for containers in the expected state (the OK messages).

In line with other services, a verbose option would be the preferred way to implement this.
The standard output would be that only warning and critical information is shown. When using the --verbose or -v option, the output would be as currently is.

I'll make an attempt to implement this behavior, but as I don't have experience with python I'm not sure whether my code will be nice and sustainable...

timdaman commented 6 years ago

That is a fair point, I do emit a lot of info. Nagios has a limit on how much output a plugin can produce, I think 4K. On busy hosts this could also be a an issue.

Thinking about this I agree status text could be shorter. The performance data, however, can make that harder. Even if a check is passing we still need to display it's performance data so the the graphs drawn from it don't have gaps.

If we make the output shorter we need to make sure not to breaking existing installations. Since currently we are verbose by default I think we need to I am thinking this could be implemented by the addition of a couple arguments.

--no-performance this would disable displaying performance data. It should be very easy to do.

--no-ok, disable OK messages, like your PR does. Just reversing the default and changing some of the words around.

What do you think?

rogerniesten commented 6 years ago

I’m not sure whether peformance data is included in the output limit. But if it is, the --no-performance argument would be a good way to escape from cutting off too large output data.

Regarding your --no-ok argument… I was also struggling with the dilemma to follow the developers guidelines (amount of output specified by -v -vv etc.) and keeping behavior the same in default situation… In my PR I choose the developers guidelines above keeping existing behavior (as nothing really breaks, only less information which can be “fixed” by adding the -v argument). As I don’t (yet) use all functionality of your plugin, I probably don’t oversee all implications with this (in my opinion) small change in behavior, nor do I know who or how many people are using your plugin. I assume you have a much better overview. So if you prefer to keep the default behavior completely unchanged by implementing the --no-ok argument, that’s completely fine with me.

timdaman commented 5 years ago

Sorry the delay in my response, I have been out of town a lot.

Hmm, when I went back an look for the limits on output I couldn't find anything solid. Not sure I got that 4KB number.

I agree, it appears I am not matching the guidance on verbosity, good catch. Since the check is opportunistic and could be checking a unknown number of services the verbosity of the check gives positive feedback the service is present and OK. By going with -v logic it would lead to breakage for people assuming this. By adding the flag doesn't change anything for existing users but allows them to do more.

I think moving to the -v standard is the best long term strategy but I would rather save that for a major version change so I am thinking the --no variants of the flags are the best for now.

timdaman commented 5 years ago

2.0.9 now allows controlling verbosity. True, it is not standard nrpe but it is not breaking for previous users of check_docker.

timdaman commented 5 years ago

FYI, due to packaging issues I pulled 2.0.9 and released this improvement as 2.1.0