shinken-solutions / shinken

Flexible and scalable monitoring framework
http://www.shinken-monitoring.org
GNU Affero General Public License v3.0
1.13k stars 336 forks source link

Arbiter Daemon Crashes #1660

Open GrandNico opened 9 years ago

GrandNico commented 9 years ago

Hi!

I am currently testing a Shinken installation in my company, this works very well, except this issue :

After a few hours of work (even if i don't modify the configuration), when i restart Shinken (/etc/init.d/shinken restart), the arbiter daemon doesn't restart.

Here is the /var/log/shinken/arbiterd.log :

[1434111824] INFO: [Shinken] Cutting the hosts and services into parts [1434111824] INFO: [Shinken] Creating packs for realms [1434111824] INFO: [Shinken] Number of hosts in the realm All: 10 (distributed in 10 linked packs) [1434111824] INFO: [Shinken] Total number of hosts : 10 [1434111824] INFO: [Shinken] Things look okay - No serious problems were detected during the pre-flight check [1434111824] INFO: [Shinken] [Arbiter] Serializing the configurations... [1434111824] INFO: [Shinken] Using the default serialization pass [1434111824] INFO: [Shinken] Configuration Loaded [1434111824] INFO: [Shinken] Trying to initialize additional groups for the daemon [1434111824] INFO: [Shinken] Stale pidfile exists ([Errno 3] No such process), Reusing it. [1434111824] INFO: [Shinken] Opening HTTP socket at http://localhost:7770 [1434111824] INFO: [Shinken] Initializing a CherryPy backend with 8 threads [1434111824] INFO: [Shinken] Using the local log file '/var/log/shinken/arbiterd.log' [1434111824] INFO: [Shinken] Printing stored debug messages prior to our daemonization [1434111824] INFO: [Shinken] Successfully changed to workdir: /var/lib/shinken [1434111824] INFO: [Shinken] Opening pid file: /var/run/shinken/arbiterd.pid [1434111824] INFO: [Shinken] Redirecting stdout and stderr as necessary.. [1434111824] INFO: [Shinken] We are now fully daemonized :) pid=19108 [1434111824] INFO: [Shinken] And arbiter is launched with the hostname:shinken2 from an arbiter point of view of addr:shinken2.bbugroup.local [1434111824] INFO: [Shinken] Begin to dispatch configurations to satellites [1434111824] INFO: [Shinken] Starting HTTP daemon [1434111824] CRITICAL: [Shinken] I got an unrecoverable error. I have to exit. [1434111824] CRITICAL: [Shinken] You can get help at https://github.com/naparuba/shinken [1434111824] CRITICAL: [Shinken] If you think this is a bug, create a new ticket includingdetails mentioned in the README [1434111824] CRITICAL: [Shinken] Back trace of the error: Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/shinken/daemons/arbiterdaemon.py", line 626, in main self.do_mainloop() File "/usr/local/lib/python2.7/dist-packages/shinken/daemon.py", line 333, in do_mainloop self.do_loop_turn() File "/usr/local/lib/python2.7/dist-packages/shinken/daemons/arbiterdaemon.py", line 662, in do_loop_turn self.run() File "/usr/local/lib/python2.7/dist-packages/shinken/daemons/arbiterdaemon.py", line 745, in run self.dispatcher.check_alive() File "/usr/local/lib/python2.7/dist-packages/shinken/dispatcher.py", line 122, in check_alive elt.update_infos() File "/usr/local/lib/python2.7/dist-packages/shinken/objects/satellitelink.py", line 200, in update_infos self.ping() File "/usr/local/lib/python2.7/dist-packages/shinken/objects/satellitelink.py", line 226, in ping r = self.con.get('ping') File "/usr/local/lib/python2.7/dist-packages/shinken/http_client.py", line 132, in get ret = json.loads(response.getvalue().replace('\/', '/')) File "/usr/lib/python2.7/json/init.py", line 326, in loads return _default_decoder.decode(s) File "/usr/lib/python2.7/json/decoder.py", line 365, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/lib/python2.7/json/decoder.py", line 383, in raw_decode raise ValueError("No JSON object could be decoded") ValueError: No JSON object could be decoded

The arbiter daemon is supposed to be launched but he is not, and there is no process with PID 19108. The only solution is to completely reboot the server, and after Shinken works.

Thanks for you help!

Debian 7.8 Shinken 2.4 (installed with pip)

naparuba commented 8 years ago

Thanks for reporting.