openresty / lua-resty-upstream-healthcheck

Health Checker for Nginx Upstream Servers in Pure Lua
521 stars 134 forks source link

Timers "leaking" ? #69

Open wbednarczyk opened 4 years ago

wbednarczyk commented 4 years ago

Hi, On several of our servers we use the "healthcheck" module, unfortunately something is wrong because after a while, the messages start to appear: "failed to create timer: too many pending timers". I know it's a standard error message, but it seemed strange to me so I decided to check what the situation was like. It looks like a few timers appear very quickly (although I have an error here that the number is negative), but the number of pending timers is constantly increasing up to the configured maximum. Increasing the maximum amount dosen't help, just saturation takes a little more time. After ca. hour in logs got values like this:

Current runing timers -2129 (??!!!)
Current penging timers 4096
failed to spawn health checker: failed to create timer: too many pending timers,

I do not think it possible for healthcheck to take so long time to block timers, unfortunately my skills in the lua are a bit too low to debug it properly. Is this a known error? Can it be avoided somehow? Any help / insights would be very appreciated.

Our config looks like this:

       local hc = require "resty.upstream.healthcheck"

        local ok, err = hc.spawn_checker{
            shm = "healthcheck1",
            upstream = "application_karaf",
            type = "http",
            http_req = "GET /tenant/health HTTP/1.0\r\nHost: localhost\r\n\r\n",
            interval = 3000, timeout = 1500, fall = 3, rise = 2,
            valid_statuses = {200, 302},
            concurrency = 10,
        }
        if not ok then
            ngx.log(ngx.ERR, "failed to spawn health checker: ", err)
            ngx.log(ngx.ERR, "Current penging timers ", ngx.timer.pending_count())
            ngx.log(ngx.ERR, "Current runing timers ", ngx.timer.running_count())
            return
        end
spacewander commented 4 years ago

A weird problem. The spawn_checker is expected to be called once, which will tick a new timer at constant interval. Do you call this method multiple times?

rainingmaster commented 4 years ago

@wbednarczyk Could you share your Nginx config more? It seem you run spawn_checkerwhich will create a forever timer in per request(content_by_lua*, rewrite_by_lua*) instead of init_worker_by_lua_block?

Beside, I think you can share your idea or ask question in the official form: https://forum.openresty.us/t/en-discussion

wbednarczyk commented 4 years ago

Hi, sorry for late response.

@spacewander It seems yes. What would be proper way to use spawn_checker with multiple healthchecks? @rainingmaster Below more of our configs.

I would really appreciate any help on that topic. Thanks in advance!

EDIT: I re-read your comments, and I think you suggest that running multiple spawn_checker have to be inside init_worker_by_lua_block like said in https://github.com/openresty/lua-resty-upstream-healthcheck/blob/master/README.markdown#multiple-upstreams . Am I right here?

and fragment of our nginx.conf which loads lua script:

user nginx;
worker_processes 2;

...

http {
                lua_max_pending_timers 8192;
                lua_shared_dict healthcheck1 1m;
                lua_shared_dict healthcheck2 1m;
                lua_shared_dict healthcheck3 1m;
                lua_shared_dict healthcheck4 1m;
                lua_shared_dict healthcheck5 1m;

                lua_socket_log_errors off;
                access_by_lua_file /etc/nginx/healthcheck.lua;
}
...