jjethwa / icinga2

GNU General Public License v3.0
222 stars 187 forks source link

Issue with Ping #231

Open amcmorris-piksel opened 4 years ago

amcmorris-piksel commented 4 years ago

Setting up a new installation and having issues with Ping.

I am getting the following message in the console: CRITICAL - Could not interpret output from ping command

When do from the command line under root it works, but if I try under nagios I get this error: ping: socket: Operation not permitted

Anyone else seen this before? I am fairly new to Icinga so just getting my feet together with it.

A.

jjethwa commented 4 years ago

Hi @amcmorris-piksel

Are you setting up a new ping check or is this the default ping check on the icinga server?

amcmorris-piksel commented 4 years ago

@jjethwa This was a new ping check, code below:

object Host "NAME" { address = "FQDN" check_command = "hostalive" }

Nothing complex, wonder if doing something silly, command below works okay from the root account and tried from nagios account and got the ^ error.

Plugin Output /bin/ping -4 -n -U -w 30 -c 5 FQDN CRITICAL - Could not interpret output from ping command

jjethwa commented 4 years ago

Does the default icinga2 server hostalive check work?

The URL is http://:/icingaweb2/dashboard#!/icingaweb2/monitoring/host/show?host=icinga2

That uses the hostalive check_command as well

amcmorris-piksel commented 4 years ago

Yes unfortunatly also getting the error on that with the following output: :(

/bin/ping -4 -n -U -w 30 -c 5 127.0.0.1 CRITICAL - Could not interpret output from ping command

Unsure what is going on, any idea of next steps?

amcmorris-piksel commented 4 years ago

Bit more info, on the same Docker Host have done a diff test: docker run -p 8080:80 -h icinga2 -t jordan/icinga2:latest

And looks like getting the same output as above, also getting:

Check execution Reachable | no

Happy to provide or try anything needed.

jjethwa commented 4 years ago

Thanks for the details @amcmorris-piksel I pulled latest but don't see the same issue unfortunately. It looks like the ping check is configured to use /usr/lib/nagios/plugins/check_ping

The full command is:

'/usr/lib/nagios/plugins/check_ping' '-4' '-H' '127.0.0.1' '-c' '200,15%' '-w' '100,5%'
amcmorris-piksel commented 4 years ago

Thanks for that, just tried the below on a fresh image.

root@icinga2:/usr/lib/nagios/plugins# sudo -u nagios /usr/lib/nagios/plugins/check_ping '-4' '-H' '127.0.0.1' '-c' '200,15%' '-w' '100,5%' /bin/ping -4 -n -U -w 10 -c 5 127.0.0.1 CRITICAL - Could not interpret output from ping command

I think this is an issue with the Docker host from some searching around: https://github.com/jjethwa/icinga2/issues/52

Just not sure what the equivalent will be to get this working in Ubuntu 16.04

jjethwa commented 4 years ago

Ah, I had forgotten about that issue. Try adding the --privileged flag to the docker run command and see if that works

amcmorris-piksel commented 4 years ago

Thanks, wish that worked, tried: docker run --rm --privileged --cap-add=ALL -p 8080:80 -h icinga2 -t jordan/icinga2:latest

But got:

[2020-06-17 14:29:50 +0000] warning/PluginNotificationTask: Notification command for object 'icinga2' (PID: 2297, arguments: '/etc/icinga2/scripts/mail-host-notification.sh' '-4' '127.0.0.1' '-6' '::1' '-b' '' '-c' '' '-d' '2020-06-17 14:29:50 +0000' '-l' 'icinga2' '-n' 'icinga2' '-o' '/bin/ping -4 -n -U -w 30 -c 5 127.0.0.1 CRITICAL - Could not interpret output from ping command' '-r' 'root@localhost' '-s' 'DOWN' '-t' 'PROBLEM' '-v' 'false') terminated with exit code 36, output: /etc/icinga2/scripts/mail-host-notification.sh: 148: [: false: unexpected operator mail: cannot send message: Process exited with a non-zero status

Just does not like this version of docker it looks like. :(

jjethwa commented 4 years ago

So bizarre. Maybe you can try running it on one of the container Linux distros like Flatcar?

amcmorris-piksel commented 4 years ago

Going to move the PoC to AWS rather than use our on premises Docker Hosts, thanks for the help.

adamparker commented 3 years ago

I had this happen to me as well with CentOS. One of the symptoms were that the ping processes were not being terminated properly and ended up as zombie processes. This would go on until eventually there were no resources available.

I never solved it but hope this information helps.

jjethwa commented 3 years ago

Thanks for the tip, @adamparker

Would you be able to test out adding a timeout to your ping config to see if that gets rid of the zombies?

ghost commented 3 years ago

We're having the same issue on ubuntu 20.04 with no internet access. We have the exact same setup in a Vagrant which works (even without the internet access).

It seems to be a rights issue (still not sure why it works on some machines and not on others): root@icinga2:/# usermod nagios --shell /bin/bash root@icinga2:/# su - nagios nagios@icinga2:~$ /bin/ping 127.0.0.1 ping: socket: Operation not permitted nagios@icinga2:~$ logout root@icinga2:/# /bin/ping 127.0.0.1 PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data. 64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.034 ms 64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.027 ms ^C --- 127.0.0.1 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 28ms rtt min/avg/max/mdev = 0.027/0.030/0.034/0.006 ms

looking online for a solution gave me the following:

chmod u+s /bin/ping but this doesn't seem to work: root@icinga2:/# chmod u+s /bin/ping root@icinga2:/# su - nagios nagios@icinga2:~$ /bin/ping 127.0.0.1 ping: socket: Operation not permitted

someone suggested changing the langauge of the system but it's already set to nothing.

Looking at the rights on both the server and in the vagrant: vagrant: 543757 -rwsr-sr-x 1 root root 69368 Jan 13 2020 ping server: 8357416 -rwsr-sr-x 1 root root 69368 Jan 13 2020 ping

I've also looked into the docker versions: Vagrant: Client: Docker Engine - Community Version: 20.10.2 API version: 1.41 Go version: go1.13.15 Git commit: 2291f61 Built: Mon Dec 28 16:17:43 2020 OS/Arch: linux/amd64 Context: default Experimental: true

Server: Docker Engine - Community Engine: Version: 20.10.2 API version: 1.41 (minimum version 1.12) Go version: go1.13.15 Git commit: 8891c58 Built: Mon Dec 28 16:15:19 2020 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.4.3 GitCommit: 269548fa27e0089a8b8278fc4fc781d7f65a939b runc: Version: 1.0.0-rc92 GitCommit: ff819c7e9184c13b7c2607fe6c30ae19403a7aff docker-init: Version: 0.19.0 GitCommit: de40ad0

and the server: Client: Docker Engine - Community Version: 20.10.1 API version: 1.41 Go version: go1.13.15 Git commit: 831ebea Built: Tue Dec 15 04:34:58 2020 OS/Arch: linux/amd64 Context: default Experimental: true

Server: Docker Engine - Community Engine: Version: 20.10.1 API version: 1.41 (minimum version 1.12) Go version: go1.13.15 Git commit: f001486 Built: Tue Dec 15 04:32:52 2020 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.4.3 GitCommit: 269548fa27e0089a8b8278fc4fc781d7f65a939b runc: Version: 1.0.0-rc92 GitCommit: ff819c7e9184c13b7c2607fe6c30ae19403a7aff docker-init: Version: 0.19.0 GitCommit: de40ad0

jjethwa commented 3 years ago

Hi @Thixx

Thanks for all the details, I have not been able to track this down myself. I believe that it is coming down to how the host is handling the socket request. So far I have not run into the issue when using Flatcar as it's the main distro I use for docker containers.

adamparker commented 3 years ago

Hi,

I switched check_ping with check_icmp which has resolved the issue for me.

Check_ping also gave me trouble with Zombie processes which is described here https://community.icinga.com/t/defunct-zombie-ping-processes-when-using-check-ping-on/7012

jjethwa commented 3 years ago

That's great news, thanks for the update @adamparker 😃

ghost commented 3 years ago

Hi,

I switched check_ping with check_icmp which has resolved the issue for me.

Check_ping also gave me trouble with Zombie processes which is described here https://community.icinga.com/t/defunct-zombie-ping-processes-when-using-check-ping-on/7012

I wish that would work for me, but most of the commands can't be used because of the same issue... (check_icmp included) Also @jjethwa I just can't switch to another OS, kind of stuck with Ubuntu for now. I'm still looking into it.

jjethwa commented 3 years ago

Thanks for the update, @Thixx I haven't had time to research more, but I still feel that we need to focus on the host. Could be a tweak to the docker daemon or an OS security setting.

ghost commented 3 years ago

Thanks for the update, @Thixx I haven't had time to research more, but I still feel that we need to focus on the host. Could be a tweak to the docker daemon or an OS security setting.

Yeah, I think you're right! I've seen related issues in suze and centos that are solved down the road. I've found out that selinux isn't the problem and that I can't add capabilities to the container... or at least it looks like it 'forgets' them.

AlphaDE commented 2 years ago

Although this is older, but still open.

Just installed Icinga2 in an Ubuntu 20.04 LTS LXC (Proxmox) and ran into the same issue.

I finally found out that check_ping calls /bin/ping and the user nagios used by Icinga2 could not exute the ping command.

nagios@monitor:/usr/lib/nagios/plugins$ /bin/ping 127.0.0.1
/bin/ping: socket: Operation not permitted

I found in a different threat to execute

setcap cap_net_raw+p /bin/ping

and after this command, the problem was solved.

jjethwa commented 2 years ago

Hi @AlphaDE

Thanks so much for the details! Adding it to the Dockerfile 😄

TheMule71 commented 2 years ago

FYI, I've run into a similar problem. (I dont use your Dockerfile)

Many distros removed both the s-bit and capabilities to the executable of ping, sometimes relying on other methods to grant users access.

Also, container systems (docker, podman, etc.) have a role, in removing capabilities to the container as a whole.

Here's what I had to do in my Dockerfile:

RUN setcap 'cap_net_raw+ep' /usr/bin/ping

and run the container with podman run --network slirp4netns:allow_host_loopback=true --cap-add=cap_net_raw ... (from standard user, not root)

Hope it helps.

jjethwa commented 2 years ago

Thanks for the tip @TheMule71 😃