sni / mod_gearman

Distribute Naemon Host/Service Checks & Eventhandler with Gearman Queues. Host/Servicegroups affinity included.
http://www.mod-gearman.org
GNU General Public License v3.0
122 stars 42 forks source link

check_results worker crashes #5

Closed abergman closed 14 years ago

abergman commented 14 years ago

For some reason the check_results worker crashes when running icinga with event_broker_options -1 and large_installation_tweaks activated.

I've been trying a number of combinations for event_broker_options and the only thing that seems to work is 31, but that leaves a lot of data out for idomod, so it's not an option.

I've been running 4 checks on a large number (500 too 1500) of webdomains, and the worker crashes after about 15-20 mins.

Backtrace from gdb: http://www.pastebin.se/202382

abergman commented 14 years ago

I lied, it still crashes with large_installation_tweaks = 0.

Backtrace;

Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x70416950 (LWP 4321)] 0x00007fc075513230 in vfprintf () from /lib/libc.so.6 (gdb) bt

0 0x00007fc075513230 in vfprintf () from /lib/libc.so.6

1 0x0000000000431455 in log_debug_info (level=64, verbosity=1,

fmt=0x477d98 "Making callbacks (type %d)...\n") at logging.c:537

2 0x00000000004172a0 in neb_make_callbacks (callback_type=9, data=0x7040d420)

at nebmods.c:581

3 0x00000000004162cf in broker_log_data (type=,

flags=<value optimized out>, attr=<value optimized out>, 
data=0x7040d500 "mod_gearman: service job completed: agq.se PING PS 2: 3", 
data_type=262144, entry_time=1288527511, timestamp=0x0) at broker.c:136

4 0x0000000000431728 in write_to_log (

buffer=0x7040d500 "mod_gearman: service job completed: agq.se PING PS 2: 3", data_type=262144, timestamp=0x0) at logging.c:192

5 0x0000000000431de6 in write_to_all_logs (

buffer=0x7040d500 "mod_gearman: service job completed: agq.se PING PS 2: 3", data_type=262144) at logging.c:132

6 0x00007fc07486e72c in logger () from /tmp/nebmodxc9iXg

7 0x00007fc07486eec0 in get_results () from /tmp/nebmodxc9iXg

8 0x00007fc07465cbb1 in gearman_worker_work () from /usr/lib/libgearman.so.4

9 0x00007fc07486efd9 in result_worker () from /tmp/nebmodxc9iXg

10 0x00007fc075a2cfc7 in start_thread () from /lib/libpthread.so.0

11 0x00007fc07559e64d in clone () from /lib/libc.so.6

12 0x0000000000000000 in ?? ()

(gdb)

abergman commented 14 years ago

I've double checked, setting event_broker_options to 31 keeps the check_results worker from crashing.

abergman commented 14 years ago

And i reactivated large_installation_tweaks again, and it's still going strong. So perhaps its some kind of overflow when one broker to much?

sni commented 14 years ago

The problem is the not thread safe loging function of nagios. Please disable logging of mod_gearman or set it to 4, so it logs to stdout.

abergman commented 14 years ago

I've done just that and it seems to work. Thx! and sorry yet again for a stupid issue;)