Open maltesh opened 3 years ago
Hello, the issue your're facing, it's strange. I'm running a Shinken platform with more than 2k hosts, and more than 45k services, and I never had such problems.
It's a fairly old Shinken release you are running. It should be a good idea to try to upgrade, anyway. I doubt the latest release will run on Python 2.6, through.
Hardware:
CPU : 24 Core RAM : 24 GB Shinken version: 2.0.3 Python Version:2.6.6 OS: Centos 6.10
Hosts Monitored: 409 Total Services : 14600
About 60% service checks are either health checks (wmi or win-rm) with check interval of 5 to 15 minutes. About 3~5 % service checks are HTTP health checks for Rabbitmq with check interval of 1 min and notification interval of 1 min.
Its standalone machine and it’s not scaled. we are running a) poller with min_worker as 6 and max_worker as 16 b) And reactionner with min_worker as 4 and max_worker with 12.
Commonly seen in logs:
Reactionner Log:
File "/usr/lib/python2.6/site-packages/shinken/action.py", line 125, in execute return self.execute() ## OS specific part File "/usr/lib/python2.6/site-packages/shinken/action.py", line 311, in execute preexec_fn=os.setsid) File "/usr/lib64/python2.6/subprocess.py", line 642, in init errread, errwrite) File "/usr/lib64/python2.6/subprocess.py", line 1238, in _execute_child raise child_exception TypeError: execve() arg 2 must contain only strings
Broker Log:
Error : Back trace of this error: Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/shinken/daemon.py", line 864, in http_daemon_thread self.http_daemon.run() File "/usr/lib/python2.6/site-packages/shinken/http_daemon.py", line 283, in run self.srv.run() File "/usr/lib/python2.6/site-packages/shinken/http_daemon.py", line 123, in run raise PortNotFree(msg) PortNotFree: Error: Sorry, the port 7772 is not free: No socket could be created
Poller Log:
[1606292549] Error : [Livestatus Query] Error: 'Hosts' object has no attribute 'itersorted' [1606292744] Error : [broker-master] The external module livestatus goes down unexpectedly! [1606292744] Error : [broker-master] The external module npcdmod goes down unexpectedly! [1606292744] Warning : [broker-master] Connection problem to the scheduler scheduler-master: Connexion error to http://localhost:7768/ : couldn't connect to host [1606292747] Warning : [broker-master] Connection problem to the poller poller-master: Connexion error to http://localhost:7771/ : Operation timed out after 3000
Dmesg:
TCP: too many of orphaned sockets __ratelimit: 192 callbacks suppressed TCP: too many of orphaned sockets TCP: too many of orphaned sockets TCP: too many of orphaned sockets TCP: too many of orphaned sockets
Netstat;
netstat –anp | grep 7772 we see it in either FIN_WAIT1 or FIN_WAIT2 state
Currently we run sysctl -w net.ipv4.tcp_max_orphans=0 and kill and restart all shinken services to make it up and running . This happens 2 or 3 times in a day .
Please help us on overcoming this problem . Upgrading to shinken 2.4.3 will fixe the problem ? Or tuning kernel params like net.ipv4.tcp_mem, net.ipv4.tcp_fin_timeout, etc..will further help..