timdaman / check_docker

Nagios plugin to check docker containers
GNU General Public License v3.0
152 stars 60 forks source link

NRPE: Unable to read output #60

Open thatsk opened 5 years ago

thatsk commented 5 years ago
icinga node :#/usr/lib64/nagios/plugins/check_nrpe -H node -c check_docker
NRPE: Unable to read output
clientnode:#command[check_docker]=sudo docker run --rm -v /var/run/docker.sock:/var/run checkdocker --cpu $ARG1$:$ARG2$
root@node: /etc/nrpe.d# cat /etc/nagios/nrpe.cfg  | grep user
# user and is running in standalone mode.
# This determines the effective user that the NRPE daemon should run as.
# You can either supply a username or a UID.
nrpe_user=nrpe
root@node: /etc/nrpe.d# cat /etc/sudoers.d/nagios
nagios    ALL=(ALL:ALL)  NOPASSWD:ALL
nrpe ALL=(ALL:ALL)  NOPASSWD: ALL
icinga rule
apply Service "check_docker" {
  import "generic-service"
  check_command = "nrpe"
  vars.nrpe_command = "check_docker"
  vars.nrpe_arguments = [ "70", "90" ]
  assign where match("node*", host.name)
}
thatsk commented 5 years ago

not sure what is wrong. @timdaman

thatsk commented 5 years ago

any help will be appreciated. i am running python3 inside docker container and container will be treated as binary to get the health check

HigH-HawK commented 5 years ago

Hi @thatsk

The common cause for the error message "NRPE: Unable to read output" is a wrong plugin path in your nrpe.cfg but it could also be a interpreter issue.

I for example had to change the shebang line at the very top of the plugin from

#!/usr/bin/env python3

to

#!/usr/bin/python3.6

for the plugin to work.

You could also run ls -lah /usr/bin/* | grep python to see what binary versions of python are installed and amend the shebang in the check_docker file.

Have a look at this KB entry from nagios: https://support.nagios.com/kb/article/nrpe-nrpe-unable-to-read-output-620.html

timdaman commented 5 years ago

Sorry, I was busy with....life. @HigH-HawK, thanks for the observation. That sound plausible.

Looking at the command output above I assume the one below is the NRPE command installed on the host being monitored.

clientnode:#command[check_docker]=sudo docker run --rm -v /var/run/docker.sock:/var/run checkdocker --cpu $ARG1$:$ARG2$`

I am guessing you installed check_docker in a image called checkdocker and that you set the entrypoint to check_docker.

I recommend try to manually run you docker image with no arguments and confirm it works, sudo docker run --rm -v /var/run/docker.sock:/var/run checkdocker. If you get the help text then I think likely you are looking at a configuration issue in NRPE. If you get some other output (or none at all) then I would look at your entrypoint and confirm it looks good.

Please feel free to report back what you see in those tests and I will try to help. Also, sending me the Dockerfile for you checkdocker image would help me recreate you environment.

HigH-HawK commented 5 years ago

Hey @timdaman

I didn't have any issues, just tried helping the other user :)

Aeris126 commented 4 years ago

Hello, @timdaman I'm trying to integrate nagios and docker via your check_docker script and seeing error like this - "NRPE: unable to read output". If i'm issuing check_docker via command line it works fine, but seems like it reports wrong return code and that's why nagios can't handle it properly. From my syslog - Dec 2 14:46:53 plat_doc nrpe[20573]: Host 192.168.1.133 is asking for command 'check_docker' to be run... Dec 2 14:46:53 plat_doc nrpe[20573]: Running command: /usr/local/bin/check_docker --connection /var/run/docker.sock --health Dec 2 14:46:53 plat_doc nrpe[20573]: Command completed with return code 1 and output: Dec 2 14:46:53 plat_doc nrpe[20573]: Return Code: 3, Output: NRPE: Unable to read output

HigH-HawK commented 4 years ago

Hi @Aeris126

Please could you check the file permissions of the check_docker file? Since you said, that if your are running the command from the machine itself, it works, I would imagine that the nrpe or nagios user has no permission to run the command when being requested remotely.

Aeris126 commented 4 years ago

@HigH-HawK -rwxr-xr-x 1 root root 228 Nov 29 17:14 /usr/local/bin/check_docker It is weird for me that on one line it says return code - 1, and on another - 3

HigH-HawK commented 4 years ago

The permissions look ok. As for the return codes, the first one is the return code for the check_docker command and the second one is the follow up from NRPE / Nagios because the command already returned 1.

cr33dx commented 4 years ago

i was getting the same error In nrpe.cfg i gave the whole path of check_docker

command[check_docker]=/usr/local/bin/check_docker --containers  --status running

it worked for me