openresty / lua-nginx-module

Embed the Power of Lua into NGINX HTTP servers
https://openresty.org/
11.24k stars 2.02k forks source link

a coredump that may be triggered by lua coroutine #1518

Open uttinie opened 5 years ago

uttinie commented 5 years ago

Hello @agentzh I have a core problem in ARM64 that is suspected to be caused by the lua-nginx module.

The reason for not being able to provide more references is that when trying to add a print recurring problem to find a problem, the problem is not present or the problem takes more time to appear. It will appear once in about three days now.

localhost systemd-coredump[30206]: Process 12408 (nginx) of user 1000 dumped core.#012#012Stack trace of thread 12408:#012#0  0x0000ffffae2a79e0 n/a (n/a)#012#1  0xfff9000000000003 n/a (n/a)#012#2  0x00000000004fd910 ngx_http_lua_run_thread (nginx)#012#3  0x0000000000507bdc ngx_http_lua_socket_handle_read_success (nginx)#012#4  0x0000000000507428 ngx_http_lua_socket_tcp_handler (nginx)#012#5  0x000000000044f2c8 ngx_epoll_process_events (nginx)#012#6  0x0000000000445dc4 ngx_process_events_and_timers (nginx)#012#7  0x000000000044d4b8 ngx_worker_process_cycle (nginx)#012#8  0x000000000044bd74 ngx_spawn_process (nginx)#012#9  0x000000000044d980 ngx_start_worker_processes (nginx)#012#10 0x000000000044e630 ngx_master_process_cycle (nginx)#012#11 0x0000000000425b1c main (nginx)#012#12 0x0000ffffac9e7ae0 __libc_start_main (libc.so.6)#012#13 0x0000000000426294 $x (nginx)#012#14 0x0000000000426294 $x (nginx)

Some debug logs near the core

lua ngx_http_lua_socket_tcp_receive_helper: tcp receive: buf_in: 0xbdbc5b0, bufs_in: 0xbdbc5b0 at ../ngx_lua-0.10.14/src/ngx_http_lua_socket_tcp.c line 2114. 2019/05/06 18:03:02 [debug] 7334#0: 154014 lua tcp socket read timeout: 10000 2019/05/06 18:03:02 [debug] 7334#0: 154014 lua tcp socket read data: wait:0 2019/05/06 18:03:02 [debug] 7334#0: 154014 lua tcp socket read chunk 5 2019/05/06 18:03:02 [debug] 7334#0: 154014 lua tcp socket receive done: wait:0, eof:0, uri:"/v1.0/elbaas/member_groups/4ddb5359427c48d1b653079be8873b31/members?" 2019/05/06 18:03:02 [debug] 7334#0: 154014 lua tcp socket receive done in a single run 2019/05/06 18:03:02 [debug] 7334#0: 154014 lua tcp socket receive return value handler lua ngx_http_lua_socket_push_input_data: bufs_in: 0xbdbc5b0, buf_in: 0xbdbc5b0 at ../ngx_lua-0.10.14/src/ngx_http_lua_socket_tcp.c line 5733. lua ngx_http_lua_socket_push_input_data: copying input data chunk from 0xbdbc5b0: "þ" at ../ngx_lua-0.10.14/src/ngx_http_lua_socket_tcp.c line 5745. lua ngx_http_lua_socket_push_input_data: size: 5, nbufs: 1 at ../ngx_lua-0.10.14/src/ngx_http_lua_socket_tcp.c line 5763. lua ngx_http_lua_socket_push_input_data: resetting u->buffer pos & last at ../ngx_lua-0.10.14/src/ngx_http_lua_socket_tcp.c line 5781. 2019/05/06 18:03:02 [debug] 7334#0: fetching key "route_conf" in shared dict "g_route_conf_fc_cache" 2019/05/06 18:03:02 [debug] 7334#0: shmtx lock lua ngx_http_lua_shdict_lookup: node expires: 1557136986818 at ../ngx_lua-0.10.14/src/ngx_http_lua_shdict.c line 220. lua ngx_http_lua_shdict_lookup: time to live: 4163 at ../ngx_lua-0.10.14/src/ngx_http_lua_shdict.c line 228. lua ngx_http_lua_ffi_shdict_get: shdict lookup returns 0 at ../ngx_lua-0.10.14/src/ngx_http_lua_shdict.c line 2532. lua ngx_http_lua_ffi_shdict_get: data: 0xffff96f8e0c4 at ../ngx_lua-0.10.14/src/ngx_http_lua_shdict.c line 2544. lua ngx_http_lua_ffi_shdict_get: key len: 10 at ../ngx_lua-0.10.14/src/ngx_http_lua_shdict.c line 2545. lua *** ngx_http_lua_ffi_shdict_get: user flags: 0 at ../ngx_lua-0.10.14/src/ngx_http_lua_shdict.c line 2619. 2019/05/06 18:03:02 [debug] 7334#0: shmtx unlock 2019/05/06 18:03:05 [notice] 24409#0: signal 17 (SIGCHLD) received from 7334 2019/05/06 18:03:05 [alert] 24409#0: worker process 7334 exited on signal 11 (core dumped) 2019/05/06 18:03:05 [debug] 24409#0: shmtx forced unlock 2019/05/06 18:03:05 [debug] 24409#0: shmtx forced unlock 2019/05/06 18:03:05 [debug] 24409#0: shmtx forced unlock 2019/05/06 18:03:05 [debug] 24409#0: shmtx forced unlock 2019/05/06 18:03:05 [debug] 24409#0: shmtx forced unlock 2019/05/06 18:03:05 [debug] 24409#0: shmtx forced unlock 2019/05/06 18:03:05 [debug] 24409#0: shmtx forced unlock 2019/05/06 18:03:05 [debug] 24409#0: shmtx forced unlock 2019/05/06 18:03:05 [debug] 24409#0: shmtx forced unlock 2019/05/06 18:03:05 [debug] 24409#0: shmtx forced unlock 2019/05/06 18:03:05 [debug] 24409#0: shmtx forced unlock 2019/05/06 18:03:05 [debug] 24409#0: shmtx forced unlock 2019/05/06 18:03:05 [debug] 24409#0: shmtx forced unlock 2019/05/06 18:03:05 [debug] 24409#0: shmtx forced unlock 2019/05/06 18:03:05 [debug] 24409#0: shmtx forced unlock 2019/05/06 18:03:05 [debug] 24409#0: shmtx forced unlock 2019/05/06 18:03:05 [debug] 24409#0: shmtx forced unlock 2019/05/06 18:03:05 [debug] 24409#0: shmtx forced unlock 2019/05/06 18:03:05 [debug] 24409#0: shmtx forced unlock 2019/05/06 18:03:05 [debug] 24409#0: shmtx forced unlock 2019/05/06 18:03:05 [debug] 24409#0: shmtx forced unlock 2019/05/06 18:03:05 [debug] 24409#0: wake up, sigio 0 2019/05/06 18:03:05 [debug] 24409#0: reap children 2019/05/06 18:03:05 [debug] 24409#0: child: 0 -1 e:1 t:1 d:0 r:1 j:0 2019/05/06 18:03:05 [debug] 24409#0: child: 1 7334 e:0 t:1 d:0 r:1 j:0 2019/05/06 18:03:05 [debug] 24409#0: channel 4:10 2019/05/06 18:03:05 [notice] 24409#0: start worker process 27271 2019/05/06 18:03:05 [debug] 24409#0: sigsuspend 2019/05/06 18:03:05 [debug] 27271#0: add cleanup: 000000000BEC2880 2019/05/06 18:03:05 [debug] 27271#0: malloc: 000000000BBE4940:8 2019/05/06 18:03:05 [debug] 27271#0: notify eventfd: 12 2019/05/06 18:03:05 [debug] 27271#0: testing the EPOLLRDHUP flag: success 2019/05/06 18:03:05 [debug] 27271#0: malloc: 000000000BB71C10:8192

Function call logic:

nginx.conf
lua_shared_dict g_route_conf_fc_cache 10m;
--init.lua 
require "resty.core.shdict"
local ok, new_tab = pcall(require, "table.new")
if not ok or type(new_tab) ~= "function" then
    new_tab = function (narr, nrec) return {} end
end

ngx.shared.g_route_conf_fc_cache:flush_all()
local _M = {}

local g_fc_configs = {}
g_fc_configs.route_conf_time = 5
g_fc_configs.route_conf_limit = 1000  

local function traffic_control(in_conf)
    if type(in_conf) ~= "table" then
        return false
    end

    local cache = in_conf.cache
    local log_level = in_conf.log_level
    local key = in_conf.key
    local limit_access = in_conf.limit_access
    local period_time = in_conf.period_time
    local message = in_conf.message
    if cache == nil or log_level == nil or key == nil or limit_access == nil then
        return false
    end

    local req,_ = cache:get(key)
    --log stop here and a coredump happened
    if req then
        if req >= limit_access then
            return true,1
        else
            cache:incr(key,1)
        end
    else
        cache:set(key, 1, period_time)
    end

    return true,0
end

local g_route_conf_fc_cache = ngx.shared.g_route_conf_fc_cache
function _M.route_conf_fc()
    local conf = new_tab(0, 6)
    conf.cache = g_route_conf_fc_cache
    conf.key = "route_conf"
    conf.period_time = g_fc_configs.    
    conf.limit_access = g_fc_configs.route_conf_limit
    conf.log_level = ngx_WARN
    conf.message = "ROUTE_CONF_FC"
    return traffic_control(conf)
end
return _M
--route_conf_fc() will be called in other function
c1390528 commented 3 years ago

Hello, I have a similar problem. Is it solved?

`#0 ngx_http_lua_shdict_expire (n=2, n@entry=1, ctx=, ctx=) at ../ngx_lua-0.10.15/src/ngx_http_lua_shdict.c:282

1 0x000000000056c750 in ngx_http_lua_ffi_shdict_get (zone=0x3874cf38, key=0x69852c008438 "node", key_len=4, value_type=0x69852c0023d0, str_value_buf=0x69852c046e48, str_value_len=0x69852c052d08, num_value=0x69852bfcee10,

user_flags=0xfff9000000000000, get_stale=0, is_stale=0x69852c002430, err=0x69852bff4328) at ../ngx_lua-0.10.15/src/ngx_http_lua_shdict.c:2526

2 0x0000ffff93ad0334 in lj_vm_ffi_call () from /opt/huawei/openresty/luajit/lib/libluajit-5.1.so.2

3 0x000069852c002430 in ?? ()

Backtrace stopped: previous frame inner to this frame (corrupt stack?)`