luvit / lit

Toolkit for developing, sharing, and running luvit/lua programs and libraries.
http://lit.luvit.io/
Apache License 2.0
245 stars 58 forks source link

coro-net/coro-http server leaks connections? #274

Closed u3shit closed 3 years ago

u3shit commented 4 years ago

I have a small app that uses coro-http directly to create a small http server. After some time I noticed in netstat that there are a large number of connections stuck in CLOSE_WAIT state, indicating that the server never actually closed the connection. I'm not exactly sure whose responsibility is to close the server socket, but based on how coro-http uses the callback, I can't really close it from there: https://github.com/luvit/lit/blob/55f47c6c657127d80fd8db19e86990bde72b7de4/deps/coro-http.lua#L35-L39 So I think either coro-net or coro-http should close the socket, probably coro-net somewhere around here https://github.com/luvit/lit/blob/55f47c6c657127d80fd8db19e86990bde72b7de4/deps/coro-net.lua#L169

PatrickDahlin commented 4 years ago

Looking closer at it there should be some timeout for disconnecting, if not default then at least as Keep-Alive header defines. I can also confirm there are connections left in the CLOSE_WAIT state on my machine.

Bilal2453 commented 3 years ago

Looking back at this, nope, there are no closing of any kind explicitly happening in the code. Even if the peer aborts the connection the handle should still be bound as far as I can tell. The break statement you see is only to stop reading from the socket, it doesn't necessarily mean the handle is closed. Since coro-http expose the socket handle that should help a bit until a proper closing is done.

This most likely should be implemented on coro-net side, since I can't really see a reason that would let you keep a dead connection bound, though it is as well easily implemented on coro-http side just before that break.

Though I don't know if any anything depends on that behavior, some library that uses coro-net might assume that the stream is never automatically closed on connection death so can't make this change on coro-net end until a confirmation.

Good to mention that libuv... technically should timeout after undefined amount of time. Though I have waited some of the CLOSE_WAIT for 1.5 hour without them being closed.

creationix commented 3 years ago

If it's not closing the sockets, then it's a bug. I thought I had logic to auto-close in coro-net, but maybe that broke during one of the past refactors.

Bilal2453 commented 3 years ago

net.connect does auto-close connection properly. net.createServer doesn't as far as I can tell, there is just no logic for it (or I am blind).

creationix commented 3 years ago

I believe this is what should close it when there is an error in reader or writer or both the reader and writer have ended.

https://github.com/luvit/lit/blob/master/deps/coro-channel.lua#L47

  function closer.check()
    if closer.errored or (closer.read and closer.written) then
      return close()
    end
  end
Bilal2453 commented 3 years ago

does that get triggered on connection death? that's keepAlive state change

creationix commented 3 years ago

It should check in all the cases that could trigger a close.

The first two are during the libuv read callback. If there is an error or EOF, then it triggers closer.check()

We should get these when a client does either a graceful disconnect or a hard disconnect while we have the onRead callback set in libuv.

  local function onRead(err, chunk)
    if err then
      closer.errored = err
      return closer.check()
    end
    if not chunk then
      if closer.read then return end
      closer.read = true
      dispatch {}
      return closer.check()
    end

The other two are when writing. If we explicitly write nil which means to send EOF, it will mark the write side as done and check. Also if there is an error when trying to write (for example, writing to a closed socket) it will close.

    if chunk == nil then
      closer.written = true
      closer.check()
      local success, err = socket:shutdown(wait())
      if not success then
        return nil, err
      end
      err = coroutine.yield()
      return not err, err
    end

    local success, err = socket:write(chunk, wait())
    if not success then
      closer.errored = err
      closer.check()
      return nil, err
    end
creationix commented 3 years ago

This is intended behavior at least. Obviously there is a bug somewhere.

creationix commented 3 years ago

I just noticed, we should call closer.errored = err; closer.check() in the case of shutdown errors being reported from libuv. I wonder if that is the problem?

Bilal2453 commented 3 years ago

Will do a quick test

creationix commented 3 years ago

https://github.com/luvit/lit/pull/287

Bilal2453 commented 3 years ago

Doesn't sound like related: image

(this is with #287 applied)

Bilal2453 commented 3 years ago

So yeah pretty interesting, when reloading a webpage with this

  local function onRead(err, chunk)
    if err then
      print(1)
      closer.errored = err
      return closer.check()
    end
    if not chunk then
      print(2)
      if closer.read then return end
      print(3)
      closer.read = true
      dispatch {}
      return closer.check()
    end
    print(err, chunk)
    return dispatch {chunk}
  end

I am getting the request chunk, no EOF, no errors, yet checking the network, all of the previous stream handles are indeed still WAIT_CLOSE.

Bilal2453 commented 3 years ago

Not sure about that EOF not getting received part, but using curl I was able to constantly get it trigger. Anyways that's a bit unrelated looks like, the problem that is happening is basically: closer.read = true when closer.written = false, and the statement is closer.read and closer.written.

The reason closer.written is always false.. is that EOF is never actually passed, in coro-http it is doing:

write("")

which isn't considered an EOF (on coro-channel side at least) since:

if chunk == nil then

This can be either fixed by write(nil) or if #chunk == 0 then and prefer the first.

I think. Seems to behave when I did write(nil) on my end at least. Not sure if that would behave on chunked writings, nor I am sure why that write("") exists on coro-http side (was it actually meant to act as an EOF? if so this definitely should've been nil).

Bilal2453 commented 3 years ago

Oh, and writing an empty string doesn't seem to be triggering:

    local success, err = socket:write(chunk, wait())
    if not success then
      closer.errored = err
      closer.check()
      return nil, err
    end

So I assume that's just how it works.

Bilal2453 commented 3 years ago

This should do I guess #286

creationix commented 3 years ago

I believe this is now fixed. Feel free to reopen if the issue comes back after updating to latest coro-http.

Also I realized whole working on this that the server half of coro-http is very poorly tested. Even lit itself uses weblit-server instead.