openresty / lua-nginx-module

Embed the Power of Lua into NGINX HTTP servers
https://openresty.org/
11.3k stars 2.03k forks source link

ngx.exit with HTTP_NOT_FOUND causes worker process to exit #1436

Open Cibifang opened 5 years ago

Cibifang commented 5 years ago

Hi.

Recently I met some nginx carshes(not frequent, but appear). I checked the error log and coredump, found that it appeared when ngx.exit with HTTP_NOT_FOUND.

The coredump:

Program terminated with signal SIGSEGV, Segmentation fault.

(gdb) bt
#0  ngx_http_lua_socket_tcp_cleanup (data=0x41cec7c8) at ../ngx_lua-0.10.7/src/ngx_http_lua_socket_tcp.c:3303
#1  0x000000000046454e in ngx_http_free_request (r=r@entry=0x7f792af9a5b0, rc=0) at src/http/ngx_http_request.c:3456
#2  0x00000000004647c8 in ngx_http_close_request (rc=0, r=<optimized out>) at src/http/ngx_http_request.c:3427
#3  0x0000000000464c7e in ngx_http_lingering_close_handler (rev=0x7f7949fa10d0) at src/http/ngx_http_request.c:3294
#4  0x000000000044e287 in ngx_epoll_process_events (cycle=0x7f794c360b00, timer=3, flags=8209) at src/event/modules/ngx_epoll_module.c:900
#5  0x00000000004434e7 in ngx_process_events_and_timers (cycle=cycle@entry=0x7f794c360b00) at src/event/ngx_event.c:246
#6  0x000000000044afa5 in ngx_worker_process_cycle (cycle=cycle@entry=0x7f794c360b00, data=data@entry=0x11) at src/os/unix/ngx_process_cycle.c:1211
#7  0x0000000000449a2a in ngx_spawn_process (cycle=cycle@entry=0x7f794c360b00, proc=proc@entry=0x44af60 <ngx_worker_process_cycle>, data=data@entry=0x11, name=name@entry=0x5e51b0 "worker process", 
    respawn=respawn@entry=-3) at src/os/unix/ngx_process.c:198
#8  0x000000000044b204 in ngx_start_worker_processes (cycle=0x7f794c360b00, n=30, type=-3) at src/os/unix/ngx_process_cycle.c:390
#9  0x000000000044c1ce in ngx_master_process_cycle (cycle=0x7f794c360b00) at src/os/unix/ngx_process_cycle.c:144
#10 0x0000000000422fd1 in main (argc=<optimized out>, argv=<optimized out>) at src/core/nginx.c:388

I checked the code, it's in below:

static void
ngx_http_lua_socket_tcp_cleanup(void *data)
{
    ngx_http_lua_socket_tcp_upstream_t  *u = data;

    ngx_http_request_t  *r;

    r = u->request;

    ngx_log_debug1(NGX_LOG_DEBUG_HTTP, r->connection->log, 0,
                   "cleanup lua tcp socket request: \"%V\"", &r->uri);

    ngx_http_lua_socket_tcp_finalize(r, u);
}

Then I checked the u and r, found these are garbled as below.

(gdb) p r
$1 = (ngx_http_request_t *) 0xfffffffb413a11e8
(gdb) p u->reques
There is no member named reques.
(gdb) p u->request
$2 = (ngx_http_request_t *) 0xfffffffb413a11e8
(gdb) p u
$3 = (ngx_http_lua_socket_tcp_upstream_t *) 0x41cec7c8
(gdb) p *r
Cannot access memory at address 0xfffffffb413a11e8
(gdb) p (u
A syntax error in expression, near `'.
(gdb) p *u
$4 = {read_prepare_retvals = 0xffffffff00000000, write_prepare_retvals = 0xffffffff4094cc68, read_event_handler = 0x0, write_event_handler = 0xffffffff00507d50, socket_pool = 0xffffffff42568330, 
  conf = 0x7f7900000000, cleanup = 0x404e000000000000, request = 0xfffffffb413a11e8, peer = {connection = 0x0, sockaddr = 0x4045800000000000, socklen = 1080071512, name = 0x41cec858, 
    tries = 18446744069414584320, start_time = 18446744069414584320, get = 0x0, free = 0x411807a000000000, data = 0xfffffffb41395748, set_session = 0x0, save_session = 0x0, local = 0xfffffffb406095e8, 
    type = 0, rcvbuf = 0, log = 0x2b, cached = 0, transparent = 0, log_error = 0}, read_timeout = 4707460146995920896, send_timeout = 4741948734382800896, connect_timeout = 30064771072, resolved = 0xa1, 
  bufs_in = 0x412400f0, buf_in = 0x41cec550, buffer = {pos = 0x41cec8d000000000 <error: Cannot access memory at address 0x41cec8d000000000>, 
    last = 0x404d000000000000 <error: Cannot access memory at address 0x404d000000000000>, file_pos = -20380511768, file_last = 56, start = 0x6a <error: Cannot access memory at address 0x6a>, 
    end = 0x4044800000000000 <error: Cannot access memory at address 0x4044800000000000>, tag = 0xfffffffb40609558, file = 0x41cec8f041cec8f0, shadow = 0x0, temporary = 0, memory = 0, mmap = 0, recycled = 1, 
    in_file = 0, flush = 1, sync = 1, last_buf = 1, last_in_chain = 1, last_shadow = 0, temp_file = 1, num = -5}, length = 0, rest = 4690327780150214656, socket_errno = 1094276936, input_filter = 0x0, 
  input_filter_ctx = 0x404d000000000000, request_len = 18446744053329039848, request_bufs = 0xa0, read_co_ctx = 0xca, write_co_ctx = 0x4276779a66f52000, reused = 18446744053330650448, ssl_name = {
    len = 4741949455937306624, data = 0xffffffff00000000 <error: Cannot access memory at address 0xffffffff00000000>}, ft_type = 0, no_close = 0, conn_waiting = 0, read_waiting = 0, write_waiting = 0, eof = 1, 
  body_downstream = 0, raw_downstream = 0, read_closed = 1, write_closed = 1, ssl_verify = 0, ssl_session_reuse = 0}

I also went up and checked request in ngx_http_free_request, found the request is right. So the only problem seems that the ngx_http_lua_socket_tcp_upstream_t is wrong?

But the problem is that I can't reproduce it and it doesn't appear frequently(about one in a million). Some details: It only appears on specific machines and specific time. I think it is related to high concurrency.

Solutions and further debug methods are both helpful. Thanks!

openresty version: 1.11.2.2 operating system version: Linux version 3.16.0-4-amd64 (debian-kernel@lists.debian.org)

ps: I only found two related issues(#110 and #192). and these both seem not same with this case.

agentzh commented 5 years ago

@Cibifang Try removing any 3rd-party nginx C modules or 3rd-party Lua modules. The crash site is apparently not the culprit, but the consequence of earlier memory corruptions.

Another way to debug such things is to use valgrind to run your app as a whole. But make sure you've applied the "no-pool" patch for the nginx core and enforce the system allocator in your LuaJIT (if you use the OpenResty pre-built packages, then openresty-valgrind package can do the trick). See below for more details:

https://openresty.gitbooks.io/programming-openresty/content/testing/test-modes.html#_valgrind_mode

agentzh commented 5 years ago

Sorry for the late reply BTW.

BTW2: please upgrade to the latest OpenResty to avoid running into any known bugs that have already been fixed. OpenResty 1.11.2.2 is ancient.

The latest version is below:

https://openresty.org/download/openresty-1.15.8.1rc0.tar.gz

agentzh commented 5 years ago

We have very complex and large-scale deployment of OpenResty for long and we're not seeing even a single core dump file for quite a long time.