sni / mod_gearman

Distribute Naemon Host/Service Checks & Eventhandler with Gearman Queues. Host/Servicegroups affinity included.
http://www.mod-gearman.org
GNU General Public License v3.0
122 stars 42 forks source link

Can't finished check #72

Closed twscl closed 9 years ago

twscl commented 9 years ago

i use the version 1.5.0

71

follow log is plugin output more than 65536 bytes ===== trace log ====== [2015-01-20 10:47:18][19472][TRACE] 352 +++> [2015-01-20 10:47:18][19472][TRACE] add_job_to_queue() finished successfully: 0 0 [2015-01-20 10:47:18][19472][TRACE] send_result_back() finished successfully [2015-01-20 10:47:18][19472][TRACE] send_result_back() has no duplicate servers to send to. [2015-01-20 10:47:18][19472][TRACE] set_state(1) [2015-01-20 10:47:19][19472][TRACE] set_state(0) [2015-01-20 10:47:19][19472][TRACE] get_job() [2015-01-20 10:47:19][19472][TRACE] got new job H:icinga.personal.com:10149897 [2015-01-20 10:47:19][19472][TRACE] 440 +++> [2015-01-20 10:47:19][19472][TRACE] 329 ---> [2015-01-20 10:47:19][19472][TRACE] do_exec_job() [2015-01-20 10:47:19][19472][DEBUG] got service job: dbca.personal.com - Oracle Tablespace [2015-01-20 10:47:19][19472][TRACE] timeout: 10, core latency: 0 [2015-01-20 10:47:19][19472][TRACE] command: /usr/local/icinga/libexec/check_oracle_health --user ecmon --password oradbmon --connect XXXX --mode tablespace-usage [2015-01-20 10:47:19][19472][TRACE] execute_safe_command() [2015-01-20 10:47:19][19472][TRACE] started check with pid: 25405 [root@icinga mod_gearman-1.5.0]# date Tue Jan 20 10:53:01 CST 2015 [root@icinga mod_gearman-1.5.0]#

follow log is plugin output less than 65536 bytes ========== trace log ================ [2015-01-20 11:03:54][39048][TRACE] 332 +++> [612/967] [2015-01-20 11:03:54][39048][TRACE] add_job_to_queue() finished successfully: 0 0 [2015-01-20 11:03:54][39048][TRACE] send_result_back() finished successfully [2015-01-20 11:03:54][39048][TRACE] send_result_back() has no duplicate servers to send to. [2015-01-20 11:03:54][39048][TRACE] set_state(1) [2015-01-20 11:03:54][39048][TRACE] set_state(0) [2015-01-20 11:03:54][39048][TRACE] get_job() [2015-01-20 11:03:54][39048][TRACE] got new job H:icinga.personal.com:10297705 [2015-01-20 11:03:54][39048][TRACE] 436 +++> [2015-01-20 11:03:54][39048][TRACE] 325 ---> [2015-01-20 11:03:54][39048][TRACE] do_exec_job() [2015-01-20 11:03:54][39048][DEBUG] got service job: dbca.personal.com - Oracle Tablespace [2015-01-20 11:03:54][39048][TRACE] timeout: 10, core latency: 0 [2015-01-20 11:03:54][39048][TRACE] command: /usr/local/icinga/libexec/check_oracle_health --user ecmon --password oradbmon --connect XXXX --mode tablespace-usage [2015-01-20 11:03:54][39048][TRACE] execute_safe_command() [2015-01-20 11:03:54][39048][TRACE] started check with pid: 61966 [2015-01-20 11:03:56][39048][TRACE] finished check from pid: 61966 with status: 0 [2015-01-20 11:03:56][39048][TRACE] send_result_back() [2015-01-20 11:03:56][39048][TRACE] queue: check_results [2015-01-20 11:03:56][39048][TRACE] data: [2015-01-20 11:03:56][39048][TRACE] add_job_to_queue(check_results, (null), 2, 1, 2, 1) [2015-01-20 11:03:56][39048][TRACE] 33747 --->host_name=dbbord02.idc1.ux [2015-01-20 11:03:56][39048][TRACE] 44996 +++> [2015-01-20 11:03:56][39048][TRACE] add_job_to_queue() finished successfully: 0 0 [2015-01-20 11:03:56][39048][TRACE] send_result_back() finished successfully [2015-01-20 11:03:56][39048][TRACE] send_result_back() has no duplicate servers to send to. [2015-01-20 11:03:56][39048][TRACE] set_state(1) [2015-01-20 11:03:56][39048][TRACE] set_state(0) [2015-01-20 11:03:56][39048][TRACE] get_job() [2015-01-20 11:03:56][39048][TRACE] got new job H:monitor.idc1.ux:10298158

twscl commented 9 years ago

[root@icinga mod_gearman-nagios3]# diff common/check_utils.c.ori common/check_utils.c 306a307,310

    if (fcntl(pipe_stdout[1], F_SETFL, O_NONBLOCK)==-1)
    {
      perror("fcntl");
    }

when i use fcntl nonblock can solve my problem.

sni commented 9 years ago

There is a problem in the 1.5.0 with a 10second timeout, could you try the latest HEAD or package from http://mod-gearman.org/download/v1.5.1/

twscl commented 9 years ago

when i use 1.51 version got #70 error (the plugin output more than 64K)

======= trace log ============= [2015-01-21 10:21:40][31938][TRACE] finished check from pid: 35223 with status: 9 [2015-01-21 10:21:40][31938][TRACE] send_result_back() [2015-01-21 10:21:40][31938][TRACE] queue: check_results [2015-01-21 10:21:40][31938][TRACE] data: host_name=dbca.personal.com core_start_time=1421806887.0 start_time=1421806888.808185 finish_time=1421806900.809541 return_code=2 exited_ok=1 service_description=Oracle Tablespace output=(Service Check Timed Out On Worker: icinga.personal.com)