Closed lvasiliev closed 8 years ago
(gdb) up 16
#16 0x000000080240ee00 in result_worker (data=<value optimized out>) at neb_module/result_thread.c:61
61 ret = gearman_worker_work( &worker );
Current language: auto; currently minimal
(gdb) info locals
__cleanup_info__ = {pthread_cleanup_pad = {0, 34397548320, 140736949371928, 0, 0, 0, 0, 0}}
worker = {options = {allocated = false, non_blocking = false, packet_init = true, change = false, grab_uniq = true, grab_all = true, timeout_return = false},
state = GEARMAN_WORKER_STATE_START, work_state = GEARMAN_WORKER_WORK_UNIVERSAL_GRAB_JOB, function_count = 2, job_count = 0, work_result_size = 0, context = 0x0, con = 0x80640e000,
job = 0x0, job_list = 0x0, function = 0x80641d1c0, function_list = 0x80641d1c0, work_function = 0x0, work_result = 0x0, universal = {options = {dont_track_packets = false,
non_blocking = false}, verbose = GEARMAN_VERBOSE_NEVER, con_count = 1, packet_count = 4, pfds_size = 0, sending = 0, timeout = -1, con_list = 0x80640e000,
server_options_list = 0x0, packet_list = 0x80641d1f8, pfds = 0x0, log_fn = 0, log_context = 0x0, allocator = {calloc = 0, free = 0, malloc = 0, realloc = 0, context = 0x0},
_namespace = 0x0, error = {rc = GEARMAN_SUCCESS, last_errno = 0, last_error = "\000lush(Permission denied) connect -> libgearman/connection.cc:747", '\0' <repeats 1983 times>},
wakeup_fd = {18, 19}}, grab_job = {options = {allocated = false, complete = true, free_data = false}, magic = GEARMAN_MAGIC_REQUEST, command = GEARMAN_COMMAND_GRAB_JOB_ALL,
argc = 0 '\0', args_size = 12, data_size = 0, universal = 0x7fffdfdfc478, next = 0x0, prev = 0x7fffdfdfce50, args = 0x7fffdfdfcdd0 "", data = 0x0, arg = {0x0, 0x0, 0x0, 0x0, 0x0,
0x0, 0x0, 0x0}, arg_size = {0, 0, 0, 0, 0, 0, 0, 0}, args_buffer = "\000REQ\000\000\000'", '\0' <repeats 119 times>}, pre_sleep = {options = {allocated = false,
complete = true, free_data = false}, magic = GEARMAN_MAGIC_REQUEST, command = GEARMAN_COMMAND_PRE_SLEEP, argc = 0 '\0', args_size = 12, data_size = 0,
universal = 0x7fffdfdfc478, next = 0x7fffdfdfcd08, prev = 0x80641d3b8, args = 0x7fffdfdfcf18 "", data = 0x0, arg = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, arg_size = {0, 0, 0,
0, 0, 0, 0, 0}, args_buffer = "\000REQ\000\000\000\004", '\0' <repeats 119 times>}, work_job = 0x0}
worker_num = <value optimized out>
ret = GEARMAN_TOO_MANY_ARGS
(gdb)
There was a similar issue fixed in 1.5.3. However, this seems to fail deep inside libgearman. Can you update libgearman. Or downgrade? Prefered libgearman is still 0.35 which has been proven stable for some years now.
In this case, gearman job server was added in ipfw firewall.
lvv@icinga:~ % sockstat | grep ':4830' | wc -l
36875
[Thu Mar 17 15:30:06 2016] Error: Unable to open temp file '/var/spool/icinga/ramdisk/icinga.tmpLEOJcZ' for writing status data: Too many open files
[Thu Mar 17 15:30:16 2016] Error: Unable to open temp file '/var/spool/icinga/ramdisk/icinga.tmpK1vONu' for writing status data: Too many open files
[Thu Mar 17 15:30:26 2016] Error: Unable to open temp file '/var/spool/icinga/ramdisk/icinga.tmpjPEK5C' for writing status data: Too many open files
[Thu Mar 17 15:30:36 2016] Error: Unable to open temp file '/var/spool/icinga/ramdisk/icinga.tmpgFPjjm' for writing status data: Too many open files
Ok, I try to use gearmand-devel-1.1.8, bot not sure that it will be ok.
Same bug... mod_gearman: initialized version 1.5.5 (libgearman 1.1.8)
(gdb) bt
#0 0x0000000801878f5c in sbrk () from /lib/libc.so.7
#1 0x0000000801862c93 in syscall () from /lib/libc.so.7
#2 0x0000000801862ac6 in syscall () from /lib/libc.so.7
#3 0x000000080187f5be in malloc () from /lib/libc.so.7
#4 0x00000008018c4052 in getaddrinfo () from /lib/libc.so.7
#5 0x00000008018c3c4a in getaddrinfo () from /lib/libc.so.7
#6 0x00000008018c2ff5 in getaddrinfo () from /lib/libc.so.7
#7 0x00000008018e8d9f in nsdispatch () from /lib/libc.so.7
#8 0x00000008018c18fc in getaddrinfo () from /lib/libc.so.7
#9 0x0000000802667911 in gearman_connection_st::lookup (this=0x80642f000) at libgearman/connection.cc:683
#10 0x00000008026687c8 in gearman_connection_st::flush (this=0x80642f000) at libgearman/connection.cc:728
#11 0x000000080266845d in gearman_connection_st::_send_packet (this=0x80642f000, packet_arg=<value optimized out>, flush_buffer=<value optimized out>) at libgearman/connection.cc:638
#12 0x0000000802668128 in gearman_connection_st::send_packet (this=<value optimized out>, packet_arg=<value optimized out>, flush_buffer=<value optimized out>)
at libgearman/connection.cc:515
#13 0x0000000802671836 in gearman_worker_grab_job (worker=0x7fffdfdfcf90, job=0x0) at libgearman/worker.cc:711
#14 0x0000000802671d52 in gearman_worker_work (worker=0x7fffdfdfcf90) at libgearman/worker.cc:993
#15 0x000000080240edf0 in result_worker (data=<value optimized out>) at neb_module/result_thread.c:61
#16 0x0000000800b367c5 in pthread_create () from /lib/libthr.so.3
#17 0x0000000000000000 in ?? ()
(gdb)
I'm not sure, may be needed add condition for ret GEARMAN_TOO_MANY_ARGS into neb_module/result_thread.c ?
while ( 1 ) {
ret = gearman_worker_work( &worker );
if ( ret != GEARMAN_SUCCESS && ret != GEARMAN_WORK_FAIL ) {
if ( ret != GEARMAN_TIMEOUT)
gm_log( GM_LOG_ERROR, "worker error: %s\n", gearman_worker_error( &worker ) );
gearman_job_free_all( &worker );
if ( ret == GEARMAN_TIMEOUT || ret == GEARMAN_TOO_MANY_ARGS ) {
gearman_worker_unregister_all(&worker);
gearman_worker_remove_servers(&worker);
When the job server is not available, a growing number sockets with SYN_SENT state and Icinga don't work.
icinga# netstat -an | grep '.4830' | grep SYN_SENT | wc -l
35434
[2016-03-17 18:48:26][43226][ERROR] sending job to gearmand failed: flush(GEARMAN_COULD_NOT_CONNECT) gworker:4830 -> libgearman/connection.cc:811 (346 lost jobs so far)
[2016-03-17 18:49:26][43226][ERROR] sending job to gearmand failed: flush(GEARMAN_COULD_NOT_CONNECT) gworker:4830 -> libgearman/connection.cc:811 (413 lost jobs so far)
[Thu Mar 17 18:50:05 2016] Error: Unable to open temp file '/var/spool/icinga/ramdisk/icinga.tmpwH3825' for writing status data: Too many open files
[Thu Mar 17 18:50:15 2016] Error: Unable to open temp file '/var/spool/icinga/ramdisk/icinga.tmpK1DPi9' for writing status data: Too many open files
[Thu Mar 17 18:50:25 2016] Error: Unable to open temp file '/var/spool/icinga/ramdisk/icinga.tmpBgTeHz' for writing status data: Too many open files
Thats why i really recommend to run the gearmand on the save server next to icinga.
It's very bad, I think to need added a slow mechanism for it.
Hi!
Icinga coredump when gearman job server not responding.
icinga-1.13.3_1 mod_gearman: initialized version 1.5.5 (libgearman 1.0.6)