Timers "leaking" ? - Githubissues

wbednarczyk commented 4 years ago

Hi, On several of our servers we use the "healthcheck" module, unfortunately something is wrong because after a while, the messages start to appear: "failed to create timer: too many pending timers". I know it's a standard error message, but it seemed strange to me so I decided to check what the situation was like. It looks like a few timers appear very quickly (although I have an error here that the number is negative), but the number of pending timers is constantly increasing up to the configured maximum. Increasing the maximum amount dosen't help, just saturation takes a little more time. After ca. hour in logs got values like this:

Current runing timers -2129 (??!!!)
Current penging timers 4096
failed to spawn health checker: failed to create timer: too many pending timers,

I do not think it possible for healthcheck to take so long time to block timers, unfortunately my skills in the lua are a bit too low to debug it properly. Is this a known error? Can it be avoided somehow? Any help / insights would be very appreciated.

Our config looks like this:

       local hc = require "resty.upstream.healthcheck"

        local ok, err = hc.spawn_checker{
            shm = "healthcheck1",
            upstream = "application_karaf",
            type = "http",
            http_req = "GET /tenant/health HTTP/1.0\r\nHost: localhost\r\n\r\n",
            interval = 3000, timeout = 1500, fall = 3, rise = 2,
            valid_statuses = {200, 302},
            concurrency = 10,
        }
        if not ok then
            ngx.log(ngx.ERR, "failed to spawn health checker: ", err)
            ngx.log(ngx.ERR, "Current penging timers ", ngx.timer.pending_count())
            ngx.log(ngx.ERR, "Current runing timers ", ngx.timer.running_count())
            return
        end

spacewander commented 4 years ago

A weird problem. The spawn_checker is expected to be called once, which will tick a new timer at constant interval. Do you call this method multiple times?

rainingmaster commented 4 years ago

@wbednarczyk Could you share your Nginx config more? It seem you run spawn_checkerwhich will create a forever timer in per request(content_by_lua*, rewrite_by_lua*) instead of init_worker_by_lua_block?

Beside, I think you can share your idea or ask question in the official form: https://forum.openresty.us/t/en-discussion

wbednarczyk commented 4 years ago

Hi, sorry for late response.

@spacewander It seems yes. What would be proper way to use spawn_checker with multiple healthchecks? @rainingmaster Below more of our configs.

I would really appreciate any help on that topic. Thanks in advance!

EDIT: I re-read your comments, and I think you suggest that running multiple spawn_checker have to be inside init_worker_by_lua_block like said in https://github.com/openresty/lua-resty-upstream-healthcheck/blob/master/README.markdown#multiple-upstreams . Am I right here?

our healthcheck.lua actually contains 5 defined healthchecks in a way shown below (I'm not copying all file, because further down it contains some chef lines for templating)

    local hc = require "resty.upstream.healthcheck"

    local ok, err = hc.spawn_checker{
        shm = "healthcheck1",
        upstream = "karaf",
        type = "http",
        http_req = "GET /tenant/health?hch1 HTTP/1.0\r\nHost: localhost\r\n\r\n",
        interval = 4000, timeout = 1500, fall = 3, rise = 5,
        valid_statuses = {200, 302},
        concurrency = 10,
    }
    if not ok then
            ngx.log(ngx.ERR, "failed to spawn health checker: ", err)
            return
    end

    local ok, err = hc.spawn_checker{
        shm = "healthcheck2",
        upstream = "karaf_with_ip_hash",
        type = "http",
        http_req = "GET /tenant/health?hch2 HTTP/1.0\r\nHost: localhost\r\n\r\n",
        interval = 4000, timeout = 1500, fall = 3, rise = 5,
        valid_statuses = {200, 302},
        concurrency = 10,
    }
    if not ok then
            ngx.log(ngx.ERR, "failed to spawn health checker: ", err)
            return
    end
...

and fragment of our nginx.conf which loads lua script:

user nginx;
worker_processes 2;

...

http {
                lua_max_pending_timers 8192;
                lua_shared_dict healthcheck1 1m;
                lua_shared_dict healthcheck2 1m;
                lua_shared_dict healthcheck3 1m;
                lua_shared_dict healthcheck4 1m;
                lua_shared_dict healthcheck5 1m;

                lua_socket_log_errors off;
                access_by_lua_file /etc/nginx/healthcheck.lua;
}
...

openresty / lua-resty-upstream-healthcheck

Timers "leaking" ? #69