hamishforbes / lua-resty-upstream

Upstream connection load balancing and failover module for Openresty
MIT License
117 stars 26 forks source link

Problems with module #2

Closed dant4z closed 10 years ago

dant4z commented 10 years ago

After several hours of load balancing with 2 hosts, i have 504 in all requests and this errors:

2014/02/24 13:33:51 [error] 5432#0: *17441 [lua] http.lua:231: request(): Upstream Error: 504, client: X.X.X.X, server: example.com, request: "GET / HTTP/1.1", host: "example.com"

But these 2 hosts is ok. Then i restart nginx (only restart) - all is well. And after several hours this situation repeats.

hamishforbes commented 10 years ago

Hi,

Do you get an error message returned as well? A 504 implies it couldn't even successfully create a socket connection to the upstream host. So either no connection can be made to either of your 2 hosts or, and this is more likely, all the hosts have been marked as bad and they aren't being re-added to the pool by the background worker.

Have you compiled the 0.9.5 or later version of ngx_lua (theres no openresty release bundling this as yet) and added the init_worker_by_lua directive?

Could you also try dumping the current pools state with something like

        location = /pools {
            content_by_lua '
                local pools, err = api:get_pools()
                if not pools then
                    ngx.print(cjson.encode(err))
                    ngx.status = 500
                else
                    ngx.print(cjson.encode(pools))
                end
            ';
        }
dant4z commented 10 years ago

Yes, both hosts is down in upstream, and i compile nginx with new version ngx_lua from github. I have the init_worker_by_lua directive. init_worker_by_lua 'http_upstream:init_background_thread()';

i have only configuration "copy-pasted" from your example. init_by_lua ' require("cjson")

    Upstream_Socket  = require("resty.upstream.socket")
    Upstream_HTTP = require("resty.upstream.http")
    Upstream_Api = require("resty.upstream.api")

    upstream, configured = Upstream_Socket:new("my_upstream")
    api = Upstream_Api:new(upstream)
    http_upstream = Upstream_HTTP:new(upstream)

    if not configured then -- Only reconfigure on start, shared mem persists across a HUP
        api:create_pool({id = "primary", timeout = 100, read_timeout = 10000, keepalive_pool = 256, keepalive_timeout = (120*1000)})

        api:add_host("primary", { host = "192.168.2.200", port = "80", weight = 10, healthcheck = true})
        api:add_host("primary", { host = "192.168.2.202", port = "80",  weight = 10, healthcheck = true})
    end

';
hamishforbes commented 10 years ago

Ok definitely sounds like the background thread is not working properly. Are there any other errors in your error.log?

Could you also try setting healthcheck to nil in the host definition and see if this still happens?

dant4z commented 10 years ago

I see also this messages:

2014/02/18 13:53:27 [error] 3299#0: lua entry thread aborted: runtime error: /usr/local/lib/lua/resty/upstream/http.lua:76: attempt to index local 'res' (a nil value)
stack traceback:
coroutine 0:
        /usr/local/lib/lua/resty/upstream/http.lua: in function 'http_check_request'
        /usr/local/lib/lua/resty/upstream/http.lua:119: in function '_http_background_func'
        /usr/local/lib/lua/resty/upstream/http.lua:139: in function </usr/local/lib/lua/resty/upstream/http.lua:130>, context: ngx.timer
2014/02/18 13:53:27 [error] 3296#0: lua entry thread aborted: runtime error: /usr/local/lib/lua/resty/upstream/http.lua:76: attempt to index local 'res' (a nil value)
stack traceback:
coroutine 0:
        /usr/local/lib/lua/resty/upstream/http.lua: in function 'http_check_request'
        /usr/local/lib/lua/resty/upstream/http.lua:119: in function '_http_background_func'
        /usr/local/lib/lua/resty/upstream/http.lua:139: in function </usr/local/lib/lua/resty/upstream/http.lua:130>, context: ngx.timer
2014/02/18 13:54:22 [error] 3297#0: *5 connect() failed (110: Connection timed out) while connecting to upstream, client: 78.107.75.251, server: example.com, request: "GET /favicon.ico HTTP/1.1", upstream: "http://192.168.0.210:80/favicon.ico", host: "example.com"
2014/02/18 13:55:10 [error] 3377#0: lua entry thread aborted: runtime error: /usr/local/lib/lua/resty/upstream/http.lua:76: attempt to index local 'res' (a nil value)
stack traceback:
coroutine 0:
        /usr/local/lib/lua/resty/upstream/http.lua: in function 'http_check_request'
        /usr/local/lib/lua/resty/upstream/http.lua:119: in function '_http_background_func'
        /usr/local/lib/lua/resty/upstream/http.lua:139: in function </usr/local/lib/lua/resty/upstream/http.lua:130>, context: ngx.timer
2014/02/18 13:55:10 [error] 3379#0: lua entry thread aborted: runtime error: /usr/local/lib/lua/resty/upstream/http.lua:76: attempt to index local 'res' (a nil value)
stack traceback:
coroutine 0:
        /usr/local/lib/lua/resty/upstream/http.lua: in function 'http_check_request'
        /usr/local/lib/lua/resty/upstream/http.lua:119: in function '_http_background_func'
        /usr/local/lib/lua/resty/upstream/http.lua:139: in function </usr/local/lib/lua/resty/upstream/http.lua:130>, context: ngx.timer
2014/02/18 13:55:10 [error] 3380#0: lua entry thread aborted: runtime error: /usr/local/lib/lua/resty/upstream/http.lua:76: attempt to index local 'res' (a nil value)
stack traceback:
coroutine 0:
        /usr/local/lib/lua/resty/upstream/http.lua: in function 'http_check_request'
        /usr/local/lib/lua/resty/upstream/http.lua:119: in function '_http_background_func'
        /usr/local/lib/lua/resty/upstream/http.lua:139: in function </usr/local/lib/lua/resty/upstream/http.lua:130>, context: ngx.timer

Then i see many timeouts:

2014/02/18 13:55:32 [error] 3377#0: *7 connect() failed (110: Connection timed out) while connecting to upstream, client: X.X.X.X, server: example.com, request: "GET /favicon.ico HTTP/1.1", upstream: "htt
p://192.168.0.210:80/favicon.ico", host: "example.com"

Then again:

2014/02/18 14:06:06 [error] 4060#0: [lua] http.lua:110: _http_background_func(): Connection failed for host 1 (web:80) in pool primary, context: ngx.timer
2014/02/18 14:06:06 [error] 4060#0: lua entry thread aborted: runtime error: /usr/local/lib/lua/resty/upstream/http.lua:76: attempt to index local 'res' (a nil value)
stack traceback:
coroutine 0:
        /usr/local/lib/lua/resty/upstream/http.lua: in function 'http_check_request'
        /usr/local/lib/lua/resty/upstream/http.lua:119: in function '_http_background_func'
        /usr/local/lib/lua/resty/upstream/http.lua:139: in function </usr/local/lib/lua/resty/upstream/http.lua:130>, context: ngx.timer
2014/02/18 14:06:06 [error] 4063#0: [lua] http.lua:110: _http_background_func(): Connection failed for host 1 (web:80) in pool primary, context: ngx.timer
2014/02/18 14:06:06 [error] 4063#0: lua entry thread aborted: runtime error: /usr/local/lib/lua/resty/upstream/http.lua:76: attempt to index local 'res' (a nil value)
stack traceback:
coroutine 0:
        /usr/local/lib/lua/resty/upstream/http.lua: in function 'http_check_request'
        /usr/local/lib/lua/resty/upstream/http.lua:119: in function '_http_background_func'
        /usr/local/lib/lua/resty/upstream/http.lua:139: in function </usr/local/lib/lua/resty/upstream/http.lua:130>, context: ngx.timer

And:

2014/02/18 14:06:27 [error] 4061#0: *3 [lua] socket.lua:259: connect_failed(): Failed connecting to Host "1" (192.168.2.200:80) from pool "primary", client: X.X.X.X, server: example.com, request: "GET / HTTP/1.1", host: "example.com"
2014/02/18 14:06:31 [error] 4061#0: *4 [lua] socket.lua:259: connect_failed(): Failed connecting to Host "1" (192.168.2.200:80) from pool "primary", client: X.X.X.X, server: example.com, request: "GET / HTTP/1.1", host: "example.com"
2014/02/18 14:06:33 [error] 4061#0: *7 [lua] socket.lua:259: connect_failed(): Failed connecting to Host "1" (192.168.2.200:80) from pool "primary", client: X.X.X.X, server: example.com, request: "GET / HTTP/1.1", host: "example.com"
2014/02/18 14:06:33 [error] 4061#0: *7 [lua] socket.lua:196: post_process(): Host "1" in Pool "primary" is down while logging request, client: X.X.X.X, server: example.com, request: "GET / HTTP/1.1", host: "example.com"

And this repeats for host 2 and when i see this message:

014/02/18 14:45:42 [error] 4061#0: *332 [lua] http.lua:231: request(): Upstream Error: 504 ....
hamishforbes commented 10 years ago

Hi,

I've committed a fix for the first set of errors, attempting to use res when its nil in the background check.

From the rest of the log it does look like its actually failing to make a socket connection out to the upstream server. When you get those errors in the log are you able to curl the upstream server from the server running the load balancer?

You might want to also try changing the number of failed requests and the failed request window timeout for your hosts. You can do that by changing the max_fails and failed_timeout values for the pool. i.e. change

api:create_pool({id = "primary", timeout = 100, read_timeout = 10000, keepalive_pool = 256, keepalive_timeout = (120*1000)})

to

api:create_pool({id = "primary", timeout = 100, read_timeout = 10000, keepalive_pool = 256, keepalive_timeout = (120*1000), max_fails = 10, failed_timeout = 30})

Or something similar.

The default connection timeout there is also only 100ms, this could be quite low depending on your environment. You might want to try increasing it to prevent false-positive failed connections during temporary network problems.

dant4z commented 10 years ago

Ok, thanks!