openresty / lua-resty-websocket

WebSocket support for the ngx_lua module (and OpenResty)
507 stars 106 forks source link

Socket Busy Errors (was: Is this a correct way to keep connection alive?) #1

Closed bungle closed 10 years ago

bungle commented 11 years ago

Hi,

I got this working on my mac using latest OpenResty bundle, and websockets version of Lua nginx-module. This is really great stuff.

I have written small demos on my own, and I have got the basics working. What's the correct way to keep connection alive? Should I write while true do ... end loop on the websocket server or what's the preferred way to keep connection alive.

I.e. a small example with say this kind of client side js-code:

var connection = new WebSocket('ws://127.0.0.1/s');
connection.onopen = function () {
  console.log('connected');
  connection.send('text');
};
connection.onerror = function (error) {
  console.log('error ' + error);
};
connection.onmessage = function (e) {
  console.log('message ' + e.data);
};
connection.onclose = function () {
  console.log('closed');
};

Server code (location /s -> content_by_lua):

local server = require "resty.websocket.server"
local wb, err = server:new{
  timeout = 5000,
  max_payload_len = 65535,
}
if not wb then
  ngx.log(ngx.ERR, "failed to new websocket: ", err)
  return ngx.exit(444)
end
wb:set_timeout(10000)
while true do
  local data, typ, err = wb:recv_frame()
  if typ == "close" then
    wb:send_close()
    return
  elseif typ == "text" then
    local bytes, err = wb:send_text(data)
    if not bytes then
      wb:send_close()
      return
    end
  end
end
bungle commented 11 years ago

This behaves more correctly on a server if the client exits without notifying (but still, am I using this correctly?):

while true do
  local data, typ, err = wb:recv_frame()
  if not data then
    bytes, err = wb:send_ping()
    if not bytes then break end
    data, typ, err = wb:recv_frame()
    if not typ == "pong" then
      ngx.log(ngx.ERR, "client not answering with pong: ", err)
      break
    end
  elseif typ == "close" then break
  elseif typ == "text" then
    local bytes, err = wb:send_text(data)
    if not bytes then break end
  end
end
wb:send_close()

Or maybe I should ping the client until it's not answering instead of trying to receive (and probably timeouting) from it, but that's a little bit more chatty?

agentzh commented 11 years ago

@bungle First of all, thank you for trying out lua-resty-websocket!

To answer your questions:

  1. yes, you can use an infinite Lua loop to implement a keepalive WebSocket conneciton.
  2. To detect premature client connection aborts, just enable ngx_lua's lua_check_client_abort directive in your nginx.conf file: http://wiki.nginx.org/HttpLuaModule#lua_check_client_abort
  3. To reliably detect TCP half-open connections, you also need to either enable the TCP keepalive feature in your network stack or just periodically send ping frames on the WebSocket protocol level (which is a bit more expensive than TCP keepalive).
  4. In your infinite loop, you should check the return values of recv_frame(). In case of errors, you should break your loop unless the error is a reading timeout error.

If you have any further questions, just let me know :)

bungle commented 11 years ago

I'm not totally sure if I understood everything correctly (still at early stages learning all this openresty stuff), but I blogged about the installation procedure here: https://medium.com/p/1778601c9e05

agentzh commented 11 years ago

@bungle Thank you for blogging about OpenResty's websocket support!

I just noted some minor things:

  1. You don't need to install LuaJIT yourself because openresty bundles it automatically. Passing the --with-luajit option to ./configure when building OpenResty will automatically enable the bundled LuaJIT :)
  2. Whenever you receive a "close" frame, you should call wb:send_close() before quitting, because it is required by the WebSocket protocol :)
  3. Maybe you should drop a note in your sample code or text regarding fragmented WebSocket messages. Because existing browsers never send fragmented messages AFAIK, it's fine for your sample to simply discard the fragmented message but it's better to add a comment or something like that.
  4. It also makes sense to drop a note about enabling TCP keepalive in the kernel TCP stack to guard against half-open TCP connections.
bungle commented 11 years ago

@agentzh thanks again, I made corrections, and notes to the article.

Do you mean that by enabling TCP keepalive I should set this thing on (or similar in other systems):

$ sudo sysctl -w net.inet.tcp.always_keepalive=1

One problematic thing I have found with my code. I keep constantly getting these in the error.log:

2013/09/17 15:00:07 [error] 10668#0: *2 lua tcp socket read timed out, ..."

Should I set this:

lua_socket_log_errors off;

And do error handling in code (at least this fixes the problem, but are there other ways to do this)?

agentzh commented 11 years ago

Hello!

On Tue, Sep 17, 2013 at 5:05 AM, Aapo Talvensaari wrote:

@agentzh thanks again, I made corrections, and notes to article.

Thanks!

Do you mean that by enabling TCP keepalive I should set this thing on (or similar in other systems): $ sudo sysctl -w net.inet.tcp.always_keepalive=1

Nope. You just need to configure it in your nginx.conf if you're on Linux. To quote the documentation for lua_check_client_abort:

"For example, on Linux, you can configure the standard listen directive in your nginx.conf file like this:

listen 80 so_keepalive=2s:2s:8;

"

To verify that the TCP keepalive thing indeed works you can use tools like tcpdump or wireshark.

But this will affects the whole nginx server. If you only want to detect half-open connections for websockets, then you should use websocket's own ping/pong frames instead :)

One problematic thing I have found with my code. I keep contantly getting these in the error.log: 2013/09/17 15:00:07 [error] 10668#0: *2 lua tcp socket read timed out, client: 127.0.0.1, server: localhost, request: "GET /s HTTP/1.1", host: "127.0.0.1"

Should I set this: lua_socket_log_errors off;

And do error handling in code?

Yes. This is the recommended approach for serious apps based on ngx_lua :)

Best regards, -agentzh

bungle commented 11 years ago

Another question regarding this (chat server with redis). Say I have one redis subscriber, and one redis publisher, websocket, and the code like this (error handling removed for readability reasons):

local server = require "resty.websocket.server"
local redis  = require "resty.redis"

local function subscribe(ws)
  local sub = redis:new()
  sub:connect("127.0.0.1", 6379)
  sub:subscribe("chat.messages")
  while true do
    local bytes, err = sub:read_reply()
    if bytes then
        ws:send_text(bytes[3])
    end
  end
end

local ws, err = server:new{ timeout = 30000, max_payload_len = 65535 }

ngx.thread.spawn(subscribe, ws)

local pub = redis:new()
pub:connect("127.0.0.1", 6379)

while true do
  local bytes, typ, err = ws:recv_frame()
  if ws.fatal then return
  elseif not bytes then
      ws:send_ping()
  elseif typ == "close" then break
  elseif typ == "text"  then
      pub:publish("chat.messages", bytes)
  end
end

ws:send_close()

The problem here, as you can see, is the ws variable that is used in a two different contexts (websocket loop):

ws:recv_frame()

and (redis loop in subscribe function):

ws:send_text(bytes[3])

This leads to "socket busy" errors. Is there any way to get around this (I don't think a busyloop is a way to go, i.e. short timeout)? What I mean is that there are two synchronous waiting points: ws:recv_frame, and sub:read_reply. So I cannot really write a chat server that pushes messages ASAP to all clients connected, unless I create two websocket connections for each client, or set timeout values really short (and do not use a thread).

This seems to be related: https://github.com/chaoslawful/lua-nginx-module/issues/241

agentzh commented 11 years ago

@bungle Yes, you cannot read and write the same cosocket object at the same time (from two "light threads") at the moment. It is not fully duplex yet.

I'll remove this limitation soon by separating the reading state and writing state in the cosocket object's implementation.

hackers365 commented 11 years ago

@agentzh I want to forward tcp traffic via websocket ,also need read and write same cosocket object from two "light threads". Waiting for your good news

agentzh commented 11 years ago

@hackers365 I'm waiting for the patch for this from @AdallomRoy :)

agentzh commented 11 years ago

@bungle Could you please update your blog post (https://medium.com/p/1778601c9e05) to use ngx_openresty 1.4.2.9 directly? The lua-resty-websocket library is now installed and enabled by default in ngx_openresty and I'm going to remove the "websocket" branch from ngx_lua's git repos soon (because it has already been merged into "master"). Thank you!

bungle commented 11 years ago

@agentzh thanks for reminding me about that. The blog post is now updated for 1.4.2.9. Now if we could get the patch for concurrent read/write problem ;-), I would like to move on with another blog post that shows some real-world app with this stack. You, and others who have contributed to OpenResty have done fantastic work.

agentzh commented 11 years ago

@bungle Thank you! I'm looking forward to your upcoming blog posts! :) And yeah making the cosocket full duplex is high on my priority list. I hope I can reuse the work done by @AdallomRoy.

@AdallomRoy could you just provide the patch that you already have? I can make necessary changes to adapt it to the latest ngx_lua code base.

AdallomRoy commented 11 years ago

@agentzh @bungle Added in the lua-nginx-module repo. Enjoy!

agentzh commented 10 years ago

Now that the full-duplex cosocket feature has alerady landed in recent releases of ngx_lua, I'm closing this.