openresty / lua-nginx-module

Embed the Power of Lua into NGINX HTTP servers
https://openresty.org/
11.3k stars 2.03k forks source link

Why cosocket read timeout error when receive response on time? #1904

Open AlexAUM opened 3 years ago

AlexAUM commented 3 years ago

i would like to send data to server, and receive process result, but i got some lua tcp socket read timed out error, The error is as follows:

2021/07/19 04:05:33 [error] 134162#0: *383692285 lua tcp socket read timed out, client: 183.84.4.102, server: _, request: "GET /ping HTTP/1.1", 
2021/07/19 04:05:56 [error] 134166#0: *383066134 lua tcp socket read timed out, client: 111.199.189.186, server: _, request: "POST /v1/client/heartbeat HTTP/1.1", 
2021/07/19 04:06:57 [error] 134166#0: *383767683 lua tcp socket read timed out, client: 124.126.141.88, server: _, request: "POST /desktop/v1/client/heartbeat HTTP/1.1", 
2021/07/19 04:08:45 [error] 134166#0: *383850419 lua tcp socket read timed out, client: 122.224.227.210, server: _, request: "GET /cs/message/latest HTTP/1.1", 
--More--

my code is as follows, which is called in access_by_lua_block:

function Connector._query_tcp(self, packed_data)
    local sock = ngx.socket.tcp()
    sock:settimeouts(15, 5, 20)

    local ok, err = sock:connect(self.server, self.port)
    if not ok then
        return nil, err, "connect"
    end

    local ok, err = sock:send(packed_data..self.boundary)
    if not ok then
        return nil, err, "send"
    end

    local resp_data, err = sock:receiveuntil(self.boundary)()
    if not resp_data then
        return nil, err, "read"
    end
    ok, err = sock:setkeepalive(60000, 5000)
    return resp_data, nil, ""
end

other setting of nginx: tcp_nodelay on;

i use tcpdump to capture the packets, and use wireshark to find the timeout reqs with reqid. From the result, i find my network card receive process result from server immediately, the timecost is less then 2 milliseconds, my settings of read timeout is 20 milliseconds, this confuse me much image

Any help is precious!

zhuizhuhaomeng commented 3 years ago

does the reply data match the boundary? right-clicked on the timeout packet, then select follow and then tcp. you will see the request and reply data.

AlexAUM commented 3 years ago

does the reply data match the boundary? right-clicked on the timeout packet, then select follow and then tcp. you will see the request and reply data.

i am sure that the reply data match the boundary which are 3 ETX characters ,like bellow

image
doujiang24 commented 3 years ago

@AlexAUM tcpsock:receiveuntil only returns an iterator Lua function that can be called to read the data stream. I haven't seen you have called the iterator Lua function in your code.

Maybe you misunderstand the receiveuntil function?

https://github.com/openresty/lua-nginx-module/#tcpsockreceiveuntil

AlexAUM commented 3 years ago

@doujiang24 i called receiveuntil with parenthesis,like this sock:receiveuntil(self.boundary)(), in my test,i can get the correct response in time,but in production environment,i get some read timeout error

doujiang24 commented 3 years ago

okay, got it.

you can try to debug it by adding more logs, like:

local reader = sock:receiveuntil(self.boundary)()
local data, err, partial = reader()
if not data then
    ngx.log(ngx.ERR, "failed to read boundary: ", self.boundary, ", err: ", err, ", partial: ", partial)
end
AlexAUM commented 3 years ago

@doujiang24 thanks, i will do more tests. i am confused that the error lua tcp socket read timed out is threw by sock:receiveuntil(self.boundary)(), that means in 20 ms, sock:receiveuntil(self.boundary)() does not return. After 20ms, it return err, which is timeout. 20ms include time of sending data from sending buffer to server, is it possible that the data is blocked in sending buffer?

my nginx settings are as bellows: sendfile on; tcp_nopush on; tcp_nodelay on; (defult setting) lua_socket_send_lowat 0; (defult setting)

doujiang24 commented 3 years ago

is it possible that the data is blocked in sending buffer

what kind of buffer? Do you mean it's blocked in ring buffer in kernel => nginx in user land? It's almost impossible if your server is not overload.

You'd better ask help from OpenResty XRay(which is a commercial product) if it only reproduces in your production env. https://openresty.com/en/xray/