naemon / naemon-livestatus

Naemon - Livestatus Eventbroker Module
GNU General Public License v2.0
26 stars 30 forks source link

2 processes listening on livestatus socket #117

Open pvdputte opened 7 months ago

pvdputte commented 7 months ago

I've previously commented on this livestatus issue but probably should have opened a new one here instead. Sorry.

Basically, the problem I see is that even in a fresh install without any custom configuration except for the TCP livestatus socket, after a systemctl reload naemon, there are two processes listening:

vagrant@bookworm:~$ sudo netstat -tupan | grep -e Recv -e naemon
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 0.0.0.0:6557          0.0.0.0:*               LISTEN      4067/naemon         
tcp        3      0 0.0.0.0:6557          0.0.0.0:*               LISTEN      4072/naemon   

One of them is not responding (waiting to be reaped? although top is not saying it's a zombie) As a result, Thruk sometimes behaves erratically, says the backend is down etc.

This is the config:

vagrant@bookworm:~$ cat /etc/naemon/module-conf.d/livestatus.cfg 
# Naemon config
broker_module=/usr/lib/naemon/naemon-livestatus/livestatus.so inet_addr=0.0.0.0:6557 debug=1
event_broker_options=-1

This can be easily reproduced in vagrant.

$ vagrant init debian/bookworm64
$ vagrant up
$ vagrant ssh

Next, copy these commands into a script and execute it.

vagrant@bookworm:~$ wget -O reproduce https://github.com/naemon/naemon-livestatus/files/13950328/reproduce.txt
vagrant@bookworm:~$ chmod +x reproduce
vagrant@bookworm:~$ ./reproduce

This should result in something like

<installation>

----------
Restarting

tcp        0      0 0.0.0.0:6557            0.0.0.0:*               LISTEN      5408/naemon         

naemon,5408 --daemon /etc/naemon/naemon.cfg
  ├─naemon,5409 --worker /var/lib/naemon/naemon.qh
  ├─naemon,5410 --worker /var/lib/naemon/naemon.qh
  ├─naemon,5411 --worker /var/lib/naemon/naemon.qh
  ├─naemon,5412 --worker /var/lib/naemon/naemon.qh
  └─naemon,5413 --daemon /etc/naemon/naemon.cfg

systemd,5367 --user
  └─(sd-pam),5368

----------------------
Reloading until broken

Success.
tcp        0      0 0.0.0.0:6557            0.0.0.0:*               LISTEN      5408/naemon         
tcp        0      0 0.0.0.0:6557            0.0.0.0:*               LISTEN      5413/naemon         

naemon,5408 --daemon /etc/naemon/naemon.cfg
  ├─naemon,5413 --daemon /etc/naemon/naemon.cfg
  ├─naemon,5434 --worker /var/lib/naemon/naemon.qh
  ├─naemon,5435 --worker /var/lib/naemon/naemon.qh
  ├─naemon,5436 --worker /var/lib/naemon/naemon.qh
  └─naemon,5437 --worker /var/lib/naemon/naemon.qh

systemd,5367 --user
  └─(sd-pam),5368

---------------------------------
Running "GET status" every second, response size 0 is not good:
2024-01-16T13:08:05+00:00 976
2024-01-16T13:08:06+00:00 976
2024-01-16T13:08:07+00:00 0
2024-01-16T13:08:10+00:00 977
2024-01-16T13:08:11+00:00 0
^C

Notice that process 5413 already exists when naemon is first started, but only after the reload, it also starts listening on that socket.

My current workaround is to restart instead of reload after each config change, but this takes a lot longer than reloading (rather large config). Or I should go back to xinetd.

pvdputte commented 4 months ago

FYI, I went back to xinetd & unixcat /var/cache/naemon/live. systemctl reload works fine again as xinetd is now handling the TCP socket. So I have my 17 second reload back, instead of a 30 second restart. :slightly_smiling_face:

Perhaps someday I could look into systemd socket activation for this.