Closed dant4z closed 10 years ago
Hi,
Do you get an error message returned as well? A 504 implies it couldn't even successfully create a socket connection to the upstream host. So either no connection can be made to either of your 2 hosts or, and this is more likely, all the hosts have been marked as bad and they aren't being re-added to the pool by the background worker.
Have you compiled the 0.9.5 or later version of ngx_lua (theres no openresty release bundling this as yet) and added the init_worker_by_lua directive?
Could you also try dumping the current pools state with something like
location = /pools {
content_by_lua '
local pools, err = api:get_pools()
if not pools then
ngx.print(cjson.encode(err))
ngx.status = 500
else
ngx.print(cjson.encode(pools))
end
';
}
Yes, both hosts is down in upstream, and i compile nginx with new version ngx_lua from github. I have the init_worker_by_lua directive. init_worker_by_lua 'http_upstream:init_background_thread()';
i have only configuration "copy-pasted" from your example. init_by_lua ' require("cjson")
Upstream_Socket = require("resty.upstream.socket")
Upstream_HTTP = require("resty.upstream.http")
Upstream_Api = require("resty.upstream.api")
upstream, configured = Upstream_Socket:new("my_upstream")
api = Upstream_Api:new(upstream)
http_upstream = Upstream_HTTP:new(upstream)
if not configured then -- Only reconfigure on start, shared mem persists across a HUP
api:create_pool({id = "primary", timeout = 100, read_timeout = 10000, keepalive_pool = 256, keepalive_timeout = (120*1000)})
api:add_host("primary", { host = "192.168.2.200", port = "80", weight = 10, healthcheck = true})
api:add_host("primary", { host = "192.168.2.202", port = "80", weight = 10, healthcheck = true})
end
';
Ok definitely sounds like the background thread is not working properly. Are there any other errors in your error.log?
Could you also try setting healthcheck to nil in the host definition and see if this still happens?
I see also this messages:
2014/02/18 13:53:27 [error] 3299#0: lua entry thread aborted: runtime error: /usr/local/lib/lua/resty/upstream/http.lua:76: attempt to index local 'res' (a nil value)
stack traceback:
coroutine 0:
/usr/local/lib/lua/resty/upstream/http.lua: in function 'http_check_request'
/usr/local/lib/lua/resty/upstream/http.lua:119: in function '_http_background_func'
/usr/local/lib/lua/resty/upstream/http.lua:139: in function </usr/local/lib/lua/resty/upstream/http.lua:130>, context: ngx.timer
2014/02/18 13:53:27 [error] 3296#0: lua entry thread aborted: runtime error: /usr/local/lib/lua/resty/upstream/http.lua:76: attempt to index local 'res' (a nil value)
stack traceback:
coroutine 0:
/usr/local/lib/lua/resty/upstream/http.lua: in function 'http_check_request'
/usr/local/lib/lua/resty/upstream/http.lua:119: in function '_http_background_func'
/usr/local/lib/lua/resty/upstream/http.lua:139: in function </usr/local/lib/lua/resty/upstream/http.lua:130>, context: ngx.timer
2014/02/18 13:54:22 [error] 3297#0: *5 connect() failed (110: Connection timed out) while connecting to upstream, client: 78.107.75.251, server: example.com, request: "GET /favicon.ico HTTP/1.1", upstream: "http://192.168.0.210:80/favicon.ico", host: "example.com"
2014/02/18 13:55:10 [error] 3377#0: lua entry thread aborted: runtime error: /usr/local/lib/lua/resty/upstream/http.lua:76: attempt to index local 'res' (a nil value)
stack traceback:
coroutine 0:
/usr/local/lib/lua/resty/upstream/http.lua: in function 'http_check_request'
/usr/local/lib/lua/resty/upstream/http.lua:119: in function '_http_background_func'
/usr/local/lib/lua/resty/upstream/http.lua:139: in function </usr/local/lib/lua/resty/upstream/http.lua:130>, context: ngx.timer
2014/02/18 13:55:10 [error] 3379#0: lua entry thread aborted: runtime error: /usr/local/lib/lua/resty/upstream/http.lua:76: attempt to index local 'res' (a nil value)
stack traceback:
coroutine 0:
/usr/local/lib/lua/resty/upstream/http.lua: in function 'http_check_request'
/usr/local/lib/lua/resty/upstream/http.lua:119: in function '_http_background_func'
/usr/local/lib/lua/resty/upstream/http.lua:139: in function </usr/local/lib/lua/resty/upstream/http.lua:130>, context: ngx.timer
2014/02/18 13:55:10 [error] 3380#0: lua entry thread aborted: runtime error: /usr/local/lib/lua/resty/upstream/http.lua:76: attempt to index local 'res' (a nil value)
stack traceback:
coroutine 0:
/usr/local/lib/lua/resty/upstream/http.lua: in function 'http_check_request'
/usr/local/lib/lua/resty/upstream/http.lua:119: in function '_http_background_func'
/usr/local/lib/lua/resty/upstream/http.lua:139: in function </usr/local/lib/lua/resty/upstream/http.lua:130>, context: ngx.timer
Then i see many timeouts:
2014/02/18 13:55:32 [error] 3377#0: *7 connect() failed (110: Connection timed out) while connecting to upstream, client: X.X.X.X, server: example.com, request: "GET /favicon.ico HTTP/1.1", upstream: "htt
p://192.168.0.210:80/favicon.ico", host: "example.com"
Then again:
2014/02/18 14:06:06 [error] 4060#0: [lua] http.lua:110: _http_background_func(): Connection failed for host 1 (web:80) in pool primary, context: ngx.timer
2014/02/18 14:06:06 [error] 4060#0: lua entry thread aborted: runtime error: /usr/local/lib/lua/resty/upstream/http.lua:76: attempt to index local 'res' (a nil value)
stack traceback:
coroutine 0:
/usr/local/lib/lua/resty/upstream/http.lua: in function 'http_check_request'
/usr/local/lib/lua/resty/upstream/http.lua:119: in function '_http_background_func'
/usr/local/lib/lua/resty/upstream/http.lua:139: in function </usr/local/lib/lua/resty/upstream/http.lua:130>, context: ngx.timer
2014/02/18 14:06:06 [error] 4063#0: [lua] http.lua:110: _http_background_func(): Connection failed for host 1 (web:80) in pool primary, context: ngx.timer
2014/02/18 14:06:06 [error] 4063#0: lua entry thread aborted: runtime error: /usr/local/lib/lua/resty/upstream/http.lua:76: attempt to index local 'res' (a nil value)
stack traceback:
coroutine 0:
/usr/local/lib/lua/resty/upstream/http.lua: in function 'http_check_request'
/usr/local/lib/lua/resty/upstream/http.lua:119: in function '_http_background_func'
/usr/local/lib/lua/resty/upstream/http.lua:139: in function </usr/local/lib/lua/resty/upstream/http.lua:130>, context: ngx.timer
And:
2014/02/18 14:06:27 [error] 4061#0: *3 [lua] socket.lua:259: connect_failed(): Failed connecting to Host "1" (192.168.2.200:80) from pool "primary", client: X.X.X.X, server: example.com, request: "GET / HTTP/1.1", host: "example.com"
2014/02/18 14:06:31 [error] 4061#0: *4 [lua] socket.lua:259: connect_failed(): Failed connecting to Host "1" (192.168.2.200:80) from pool "primary", client: X.X.X.X, server: example.com, request: "GET / HTTP/1.1", host: "example.com"
2014/02/18 14:06:33 [error] 4061#0: *7 [lua] socket.lua:259: connect_failed(): Failed connecting to Host "1" (192.168.2.200:80) from pool "primary", client: X.X.X.X, server: example.com, request: "GET / HTTP/1.1", host: "example.com"
2014/02/18 14:06:33 [error] 4061#0: *7 [lua] socket.lua:196: post_process(): Host "1" in Pool "primary" is down while logging request, client: X.X.X.X, server: example.com, request: "GET / HTTP/1.1", host: "example.com"
And this repeats for host 2 and when i see this message:
014/02/18 14:45:42 [error] 4061#0: *332 [lua] http.lua:231: request(): Upstream Error: 504 ....
Hi,
I've committed a fix for the first set of errors, attempting to use res when its nil in the background check.
From the rest of the log it does look like its actually failing to make a socket connection out to the upstream server. When you get those errors in the log are you able to curl the upstream server from the server running the load balancer?
You might want to also try changing the number of failed requests and the failed request window timeout for your hosts.
You can do that by changing the max_fails
and failed_timeout
values for the pool.
i.e. change
api:create_pool({id = "primary", timeout = 100, read_timeout = 10000, keepalive_pool = 256, keepalive_timeout = (120*1000)})
to
api:create_pool({id = "primary", timeout = 100, read_timeout = 10000, keepalive_pool = 256, keepalive_timeout = (120*1000), max_fails = 10, failed_timeout = 30})
Or something similar.
The default connection timeout there is also only 100ms, this could be quite low depending on your environment. You might want to try increasing it to prevent false-positive failed connections during temporary network problems.
Ok, thanks!
After several hours of load balancing with 2 hosts, i have 504 in all requests and this errors:
2014/02/24 13:33:51 [error] 5432#0: *17441 [lua] http.lua:231: request(): Upstream Error: 504, client: X.X.X.X, server: example.com, request: "GET / HTTP/1.1", host: "example.com"
But these 2 hosts is ok. Then i restart nginx (only restart) - all is well. And after several hours this situation repeats.