openresty / lua-nginx-module

Embed the Power of Lua into NGINX HTTP servers
https://openresty.org/
11.23k stars 2.02k forks source link

OpenResty keepalive connection appears to be hung #2318

Closed soulbird closed 1 week ago

soulbird commented 3 months ago

Description

After I upgraded the production environment's Tengine 2.3.2 (Nginx 1.17.3) to OpenResty 1.24.4.2, some requests occasionally received a 499 error. I know this is usually because the client thinks OpenResty takes too long to process the request and actively disconnects. These 499 exception requests are all based on the HTTP/1.1 protocol. I captured some packets, and their situations are basically the same. One of them is: image

request header: image

It can be seen that the client sent a POST request at 09:29:55. About 2 seconds later, the client sent Fin to disconnect. During this period, OpenResty did not respond with any data. The log of this request recorded on OpenResty: image

time_msec corresponds to $msec, which is the time when OpenResty starts to record logs. The converted time is 09:30:00, which is the same as the time when the OpenResty node sends the Fin packet. The time recorded by request_time is 0.258, which means that OpenResty starts processing the request at 09:29:59.

Question

After the OpenResty node receives the POST request, why does it take 4 seconds to process the request? What might have been sent in the meantime?

Key Configuration

worker_rlimit_nofile 200000;
worker_shutdown_timeout 600;

events {
    use epoll;
    worker_connections  65535;
}

sendfile        on;
tcp_nopush         on;
tcp_nodelay        on;
keepalive_timeout  65;
keepalive_requests 100;

client_max_body_size 120m;

proxy_connect_timeout    10;
proxy_read_timeout       120;
proxy_send_timeout       120;
proxy_buffer_size        16k;
proxy_buffers            8 512k;
proxy_busy_buffers_size 1M;
proxy_temp_file_write_size 1M;
proxy_max_temp_file_size 0;

server {
    listen 80 default backlog=24576;
}

Environment

# uname -r
5.4.54-2.0.4.std7c.el7.x86_64

# /home/work/nginx/sbin/nginx -V
nginx version: mife/openresty/1.21.4.2
built by gcc 4.8.5 20150623 (Red Hat 4.8.5-28) (GCC) 
built with OpenSSL 1.1.1k  25 Mar 2021
TLS SNI support enabled
configure arguments: --prefix=/home/work/nginx --with-cc-opt='-O2 -DNGX_LUA_ABORT_AT_PANIC -I/home/work/nginx/modules/zlib/include -I/home/work/nginx/modules/pcre/include -I/usr/local/openssl/include' --add-module=../ngx_devel_kit-0.3.2 --add-module=../echo-nginx-module-0.63 --add-module=../xss-nginx-module-0.06 --add-module=../ngx_coolkit-0.2 --add-module=../set-misc-nginx-module-0.33 --add-module=../form-input-nginx-module-0.12 --add-module=../encrypted-session-nginx-module-0.09 --add-module=../srcache-nginx-module-0.33 --add-module=../ngx_lua-0.10.25 --add-module=../ngx_lua_upstream-0.07 --add-module=../headers-more-nginx-module-0.34 --add-module=../array-var-nginx-module-0.06 --add-module=../memc-nginx-module-0.19 --add-module=../redis2-nginx-module-0.15 --add-module=../redis-nginx-module-0.3.9 --add-module=../ngx_stream_lua-0.0.13 --with-ld-opt='-Wl,-rpath,/home/work/nginx/luajit/lib -Wl,-rpath, -L/home/work/nginx/modules/zlib/lib -L/home/work/nginx/modules/pcre/lib -L/usr/local/openssl/lib -Wl,-rpath,/home/work/nginx/modules/zlib/lib:/home/work/nginx/modules/pcre/lib:/usr/local/openssl/lib' --error-log-path=/home/work/log/nginx/error.log --http-log-path=/home/work/log/nginx/access.log --http-client-body-temp-path=/dev/shm/nginx_temp/client_body --http-proxy-temp-path=/dev/shm/nginx_temp/proxy --with-poll_module --with-pcre-jit --with-stream --with-stream_ssl_module --with-stream_ssl_preread_module --with-http_v2_module --without-mail_pop3_module --without-mail_imap_module --without-mail_smtp_module --with-http_stub_status_module --with-http_realip_module --with-http_addition_module --with-http_auth_request_module --with-http_secure_link_module --with-http_random_index_module --with-http_gzip_static_module --with-http_sub_module --with-http_dav_module --with-http_flv_module --with-http_mp4_module --with-http_gunzip_module --with-threads --with-compat --with-stream --with-http_ssl_module

Error Log

There are no obvious error logs. There are no logs related to insufficient number of connections and insufficient keepalive requests.

monkeyDluffy6017 commented 3 months ago

From your description, the upstream_response_time is just 0.001 secs, so the time is almost spent on the openresty itself, could you reproduce this issue in an offline environment?

soulbird commented 3 months ago

From your description, the upstream_response_time is just 0.001 secs, so the time is almost spent on the openresty itself, could you reproduce this issue in an offline environment?

It is not reproducible offline at present. In fact, there is not much traffic online, and the load of the host itself is not high. Maybe I have not found the key point to reproduce it.

zhuizhuhaomeng commented 3 months ago

You can use OpenResty XRay and check the off-CPU operation. Are there any off-CPU operation in your code ?

zhuizhuhaomeng commented 3 months ago

You can use OpenResty XRay and check the off-CPU operation. Are there any off-CPU operations in your code?

soulbird commented 3 months ago

You can use OpenResty XRay and check the off-CPU operation. Are there any off-CPU operations in your code?

I didn't find any blocking in the code. I tried to use xray to see if off-CPU exists.

soulbird commented 3 months ago

hi @zhuizhuhaomeng I tried to submit an application for xray, but it didn't seem to work, help pls