Open wbednarczyk opened 4 years ago
A weird problem.
The spawn_checker
is expected to be called once, which will tick a new timer at constant interval. Do you call this method multiple times?
@wbednarczyk
Could you share your Nginx config more? It seem you run spawn_checker
which will create a forever timer in per request(content_by_lua*
, rewrite_by_lua*
) instead of init_worker_by_lua_block
?
Beside, I think you can share your idea or ask question in the official form: https://forum.openresty.us/t/en-discussion
Hi, sorry for late response.
@spacewander It seems yes. What would be proper way to use spawn_checker
with multiple healthchecks?
@rainingmaster Below more of our configs.
I would really appreciate any help on that topic. Thanks in advance!
EDIT:
I re-read your comments, and I think you suggest that running multiple spawn_checker
have to be inside init_worker_by_lua_block
like said in https://github.com/openresty/lua-resty-upstream-healthcheck/blob/master/README.markdown#multiple-upstreams . Am I right here?
our healthcheck.lua actually contains 5 defined healthchecks in a way shown below (I'm not copying all file, because further down it contains some chef lines for templating)
local hc = require "resty.upstream.healthcheck"
local ok, err = hc.spawn_checker{
shm = "healthcheck1",
upstream = "karaf",
type = "http",
http_req = "GET /tenant/health?hch1 HTTP/1.0\r\nHost: localhost\r\n\r\n",
interval = 4000, timeout = 1500, fall = 3, rise = 5,
valid_statuses = {200, 302},
concurrency = 10,
}
if not ok then
ngx.log(ngx.ERR, "failed to spawn health checker: ", err)
return
end
local ok, err = hc.spawn_checker{
shm = "healthcheck2",
upstream = "karaf_with_ip_hash",
type = "http",
http_req = "GET /tenant/health?hch2 HTTP/1.0\r\nHost: localhost\r\n\r\n",
interval = 4000, timeout = 1500, fall = 3, rise = 5,
valid_statuses = {200, 302},
concurrency = 10,
}
if not ok then
ngx.log(ngx.ERR, "failed to spawn health checker: ", err)
return
end
...
and fragment of our nginx.conf which loads lua script:
user nginx;
worker_processes 2;
...
http {
lua_max_pending_timers 8192;
lua_shared_dict healthcheck1 1m;
lua_shared_dict healthcheck2 1m;
lua_shared_dict healthcheck3 1m;
lua_shared_dict healthcheck4 1m;
lua_shared_dict healthcheck5 1m;
lua_socket_log_errors off;
access_by_lua_file /etc/nginx/healthcheck.lua;
}
...
Hi, On several of our servers we use the "healthcheck" module, unfortunately something is wrong because after a while, the messages start to appear: "failed to create timer: too many pending timers". I know it's a standard error message, but it seemed strange to me so I decided to check what the situation was like. It looks like a few timers appear very quickly (although I have an error here that the number is negative), but the number of pending timers is constantly increasing up to the configured maximum. Increasing the maximum amount dosen't help, just saturation takes a little more time. After ca. hour in logs got values like this:
I do not think it possible for healthcheck to take so long time to block timers, unfortunately my skills in the lua are a bit too low to debug it properly. Is this a known error? Can it be avoided somehow? Any help / insights would be very appreciated.
Our config looks like this: