openresty / lua-resty-upstream-healthcheck

Health Checker for Nginx Upstream Servers in Pure Lua
521 stars 134 forks source link

each worker spawns healthchecker - is this ok? #24

Closed qrof closed 8 years ago

qrof commented 8 years ago

It looks like each worker is spawning hc.spawn_checker, is this by design?

Thanks!

Config:

worker_processes  2;
error_log logs/error.log  warn;

events {
    worker_connections 1024;
    use epoll;
}

...

    init_worker_by_lua_block {
        local hc = require "resty.upstream.healthcheck"
        ngx.log(ngx.INFO, "initialising health checker for upstreams manually defined")
...

Log:

2016/09/28 12:15:47 [notice] 4778#0: using the "epoll" event method
2016/09/28 12:15:47 [notice] 4778#0: openresty/1.9.7.3
2016/09/28 12:15:47 [notice] 4778#0: built by gcc 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5)
2016/09/28 12:15:47 [notice] 4778#0: OS: Linux 3.13.0-62-generic
2016/09/28 12:15:47 [notice] 4778#0: getrlimit(RLIMIT_NOFILE): 1024:4096
2016/09/28 12:15:47 [notice] 4779#0: start worker processes
2016/09/28 12:15:47 [notice] 4779#0: start worker process 4780
2016/09/28 12:15:47 [notice] 4779#0: start worker process 4781
2016/09/28 12:15:47 [notice] 4779#0: start cache manager process 4782
2016/09/28 12:15:47 [notice] 4779#0: start cache loader process 4783
**2016/09/28 12:15:47 [info] 4780#0: [lua] init_worker_by_lua:3: initialising health checker for upstreams manually defined, context: init_worker_by_lua***
2016/09/28 12:15:47 [error] 4780#0: [lua] healthcheck.lua:57: errlog(): healthcheck: failed to connect to 95.172.249.216:81: connection refused, context: ngx.timer
2016/09/28 12:15:47 [error] 4780#0: [lua] healthcheck.lua:57: errlog(): healthcheck: failed to connect to 193.72.34.102:81: connection refused, context: ngx.timer
**2016/09/28 12:15:47 [info] 4782#0: [lua] init_worker_by_lua:3: initialising health checker for upstreams manually defined, context: init_worker_by_lua*
2016/09/28 12:15:47 [info] 4781#0: [lua] init_worker_by_lua:3: initialising health checker for upstreams manually defined, context: init_worker_by_lua*
2016/09/28 12:15:47 [info] 4783#0: [lua] init_worker_by_lua:3: initialising health checker for upstreams manually defined, context: init_worker_by_lua***
2016/09/28 12:15:48 [error] 4780#0: [lua] healthcheck.lua:57: errlog(): healthcheck: failed to connect to 12.121.12.21:8080: timeout, context: ngx.timer
2016/09/28 12:15:53 [info] 4781#0: *20 client 10.78.153.254 closed keepalive connection
2016/09/28 12:15:57 [error] 4780#0: [lua] healthcheck.lua:57: errlog(): healthcheck: failed to connect to 91.146.149.126:81: connection refused, context: ngx.timer
2016/09/28 12:15:57 [error] 4780#0: [lua] healthcheck.lua:57: errlog(): healthcheck: failed to connect to 193.701.12.222:81: connection refused, context: ngx.timer
2016/09/28 12:15:58 [error] 4781#0: [lua] healthcheck.lua:57: errlog(): healthcheck: failed to connect to 19.126.12.23:8080: timeout, context: ngx.timer
2016/09/28 12:16:07 [error] 4781#0: [lua] healthcheck.lua:57: errlog(): healthcheck: failed to connect to 91.276.349.426:81: connection refused, context: ngx.timer
2016/09/28 12:16:07 [warn] 4781#0: [lua] healthcheck.lua:53: warn(): healthcheck: peer 951.276.349.226:81 is turned down after 3 failure(s), context: ngx.timer
2016/09/28 12:16:07 [error] 4781#0: [lua] healthcheck.lua:57: errlog(): healthcheck: failed to connect to 93.27.34.202:81: connection refused, context: ngx.timer
2016/09/28 12:16:07 [warn] 4781#0: [lua] healthcheck.lua:53: warn(): healthcheck: peer 193.727.155.22:81 is turned down after 3 failure(s), context: ngx.timer
2016/09/28 12:16:08 [error] 4783#0: [lua] healthcheck.lua:57: errlog(): healthcheck: failed to connect to 110.322.47.21:8080: timeout, context: ngx.timer
2016/09/28 12:16:08 [warn] 4783#0: [lua] healthcheck.lua:53: warn(): healthcheck: peer 10.122.133.25:8080 is turned down after 3 failure(s), context: ngx.timer
2016/09/28 12:16:47 [notice] 4783#0: http file cache: /usr/local/openresty/nginx/cache 0.152M, bsize: 4096
2016/09/28 12:16:47 [notice] 4783#0: http file cache: /usr/local/openresty/nginx/cache/stream 0.000M, bsize: 4096
2016/09/28 12:16:47 [notice] 4783#0: http file cache: /usr/local/openresty/nginx/cache/re 0.000M, bsize: 4096
2016/09/28 12:16:47 [notice] 4783#0: http file cache: /usr/local/openresty/nginx/cache/events 0.000M, bsize: 4096
2016/09/28 12:16:47 [notice] 4783#0: http file cache: /usr/local/openresty/nginx/cache/twimg 0.000M, bsize: 4096
2016/09/28 12:16:47 [notice] 4779#0: signal 17 (SIGCHLD) received
2016/09/28 12:16:47 [notice] 4779#0: cache loader process 4783 exited with code 0
2016/09/28 12:16:47 [notice] 4779#0: signal 29 (SIGIO) received
agentzh commented 8 years ago

@qrof As you can see from the timestamps of your logs, for each healthcheck tick, only one worker is doing the health check, although all the workers have the chance to win each time slot.

agentzh commented 8 years ago

@qrof There is a shm lock there so it can only be one worker doing the job at each time tick.

qrof commented 8 years ago

Thanks a lot @agentzh and respect to you - thank you for all the hard work on such a great project!

HiXinJ commented 4 years ago

你好,我了解到在do_check中使用shm lock防止多个工作进程同时发送请求。请问shm lock是否可以提前到创建timer阶段,这样就只有一个工作进程开启了健康检查

Rockybilly commented 3 months ago

@qrof There is a shm lock there so it can only be one worker doing the job at each time tick.

Is using Privileged agent for this (start timer only in priv. agent) good practice? So health check would run in only one worker? To separate it's work from the regular workers which may occasionally get high loads.

@zhuizhuhaomeng