luvit / lit

Toolkit for developing, sharing, and running luvit/lua programs and libraries.
http://lit.luvit.io/
Apache License 2.0
245 stars 58 forks source link

coro-http.request hangs on closed connection #224

Closed SinisterRectus closed 7 years ago

SinisterRectus commented 7 years ago

On rare occasions, requests infinitely hang. It's somewhat difficult to debug the coro libs because a lot of functionality is hidden in closures, but I think I've narrowed down the issue. I'm documenting what I have here until I or someone else can discover more about this.

In http-codec:

      elseif lowerKey == "connection" then
        head.keepAlive = lower(value) == "keep-alive"

In coro-http:

SinisterRectus commented 7 years ago

The following reproduces the issue:

local weblit = require('weblit')

weblit.app.bind {host = '127.0.0.1', port = 1337}

.use(require('weblit-auto-headers'))

.route({
    method = "DELETE",
    path = "/foo",
}, function (req, res, go)
    if req.query and req.query.q == '1' then
        res.headers['Connection'] = 'Close'
    end
    res.code = 204
    res.body = nil
end)

.start()

local http = require('coro-http')

coroutine.wrap(function()
    p('req1', http.request("DELETE", "http://127.0.0.1:1337/foo"))
    p('req2', http.request("DELETE", "http://127.0.0.1:1337/foo?q=1"))
end)()

The same issue occurs; write never returns here:

  if res.keepAlive then
    saveConnection(connection)
  else
    write()
  end

The trace after calling write is:

coro-wrapper.lua:137
coro-wrapper.lua:138
coro-channel.lua:124
coro-channel.lua:130
coro-channel.lua:131
coro-channel.lua:132
coro-channel.lua:39
coro-channel.lua:40
coro-channel.lua:23
coro-channel.lua:24
coro-channel.lua:25
coro-channel.lua:26
coro-channel.lua:27
coro-channel.lua:74
coro-channel.lua:78
coro-channel.lua:27
coro-channel.lua:31
coro-channel.lua:32
coro-channel.lua:34
coro-channel.lua:133
coro-channel.lua:117
coro-channel.lua:120
coro-channel.lua:133
coro-channel.lua:134
coro-channel.lua:71
coro-channel.lua:89
coro-channel.lua:39
coro-channel.lua:40
coro-channel.lua:23
coro-channel.lua:81
coro-channel.lua:85
coro-channel.lua:86
coro-channel.lua:87
coro-channel.lua:88
coro-channel.lua:58
coro-channel.lua:64
coro-channel.lua:65
coro-channel.lua:66
coro-channel.lua:67
coro-channel.lua:68
coro-channel.lua:71
coro-channel.lua:89
coro-channel.lua:39
coro-channel.lua:40
coro-channel.lua:23
coro-channel.lua:24
coro-channel.lua:25
coro-channel.lua:26
coro-channel.lua:27
coro-channel.lua:74
coro-channel.lua:78
coro-channel.lua:27
coro-channel.lua:31
coro-channel.lua:32
coro-channel.lua:34

Looks like it craps out after trying to close the socket.

A quick fix for this is to change

  function decodeRaw(chunk, index)
    if #chunk < index then return end
    return sub(chunk, index)
  end

to

  function decodeRaw(chunk, index)
    if #chunk < index then return "", index end
    return sub(chunk, index)
  end

I don't know if that's the correct fix, though.

SinisterRectus commented 7 years ago

Here's the full strack trace for the request.

http.txt

SinisterRectus commented 7 years ago

Additionally, I'm not sure why GET is included in (head.method == "GET" or head.method == "HEAD"). Only HEAD is guaranteed to not contain a body.

creationix commented 7 years ago

Any idea what situation causes the chunk to be smaller than the index? I'm sure I put that check in there because I ran into it somewhere and it was breaking stuff.

Returning an empty string signals the end of the body. We need to make sure we don't do this if there is more body coming. That's why I'd like to understand what causes this situation in the first place.

SinisterRectus commented 7 years ago

Good question. The index is the chunk length + 1. The coro-channel dispatcher doesn't show anything unusual.

'<-'    'HTTP/1.1 204 No Content\r\nServer: false\r\nDate: Thu, 14 Sep 2017 14:19:05 GMT\r\nConnection: Keep-Alive\r\n\r\n'
'<-'    nil
'<-'    'HTTP/1.1 204 No Content\r\nConnection: Close\r\nServer: false\r\nDate: Thu, 14 Sep 2017 14:19:05 GMT\r\n\r\n'
'<-'    nil
SinisterRectus commented 7 years ago

The coro-wrapper goes into the decoder with

'HTTP/1.1 204 No Content\r\nServer: false\r\nDate: Thu, 14 Sep 2017 14:26:04 GMT\r\nConnection: Keep-Alive\r\n\r\n'     1
'HTTP/1.1 204 No Content\r\nServer: false\r\nDate: Thu, 14 Sep 2017 14:26:04 GMT\r\nConnection: Keep-Alive\r\n\r\n'     104
'HTTP/1.1 204 No Content\r\nServer: false\r\nDate: Thu, 14 Sep 2017 14:26:04 GMT\r\nConnection: Keep-Alive\r\n\r\n'     104
'HTTP/1.1 204 No Content\r\nConnection: Close\r\nServer: false\r\nDate: Thu, 14 Sep 2017 14:26:04 GMT\r\n\r\n'  1
'HTTP/1.1 204 No Content\r\nConnection: Close\r\nServer: false\r\nDate: Thu, 14 Sep 2017 14:26:04 GMT\r\n\r\n'  99

and comes out with

{ { 'Server', 'false' }, { 'Date', 'Thu, 14 Sep 2017 14:27:10 GMT' }, { 'Connection', 'Keep-Alive' }, code = 204, reason = 'No Content', version = 1.1, keepAlive = true }      104
''      104
nil     nil
{ { 'Connection', 'Close' }, { 'Server', 'false' }, { 'Date', 'Thu, 14 Sep 2017 14:27:10 GMT' }, code = 204, reason = 'No Content', version = 1.1, keepAlive = false }      99
nil     nil

I'm not sure why there are 3 reads in the first request and 2 in the second.

SinisterRectus commented 7 years ago

It looks like the buffer is small enough to be decoded in one pass. What happens comes down to

    if head.keepAlive and (not (chunkedEncoding or (contentLength and contentLength > 0)))
       or (head.method == "GET" or head.method == "HEAD") then
      mode = decodeEmpty
    elseif chunkedEncoding then
      mode = decodeChunked
    elseif contentLength then
      bytesLeft = contentLength
      mode = decodeCounted
    elseif not head.keepAlive then
      mode = decodeRaw
    end

When keepAlive is true (and the other conditions are met), an empty string is returned by decodeEmpty. Otherwise, in this case, nil is returned by decodeRaw since the end of buffer is reached; there is no body after the headers on the DELETE request.

creationix commented 7 years ago

Hmm, I guess it's length + 1 because the end index is inclusive and equals length in lua's 1-based indexes.

SinisterRectus commented 7 years ago

It seems to me that that the client is trying to read a body that does not exist. Interestingly, if I change res.body = nil to res.body = '' in the weblit server example, the client request is successful. I suppose the empty string is made meaningful by the extra \r\n.

Should a special case be added so that decodeEmpty is called for a 204 response? Should http-codec be handling contentLength == 0?

creationix commented 7 years ago

So the idea is that the decoder will always emit a table for the head, followed by zero or more non-empty body chunks, followed by exactly one empty string to signify the end of body. After that the cycle repeats so that a single TCP connection can be reused for multiple http requests.

Of course, if the connection gets upgraded (via websockets), then the http-codec gets disconnected entirely and it's internal state GCed.

SinisterRectus commented 7 years ago

Right... but that doesn't happen. Again, coro-http.request never returns.

SinisterRectus commented 7 years ago

Any thoughts on this? I cannot figure out how this is breaking. Debugging shows that the socket closes, but instead of bubbling back up to coro-http, it just jumps right to the Luvit exit sequence.

...
coro-http.lua,158
coro-http.lua,159
coro-http.lua,160
coro-http.lua,169
coro-http.lua,172 -- last call in coro-http is `write()`
coro-wrapper.lua,137
coro-wrapper.lua,138
coro-channel.lua,124
coro-channel.lua,130
coro-channel.lua,131
coro-channel.lua,132
coro-channel.lua,39
coro-channel.lua,40
coro-channel.lua,23
coro-channel.lua,24
coro-channel.lua,25
coro-channel.lua,26
coro-channel.lua,27
coro-channel.lua,74
coro-channel.lua,78
coro-channel.lua,27
coro-channel.lua,31
coro-channel.lua,32
coro-channel.lua,34
coro-channel.lua,133
coro-channel.lua,117
coro-channel.lua,120
coro-channel.lua,133
coro-channel.lua,134
coro-channel.lua,71
coro-channel.lua,89
coro-channel.lua,39
coro-channel.lua,40
coro-channel.lua,23
bundle:init.lua,53
bundle:init.lua,55
bundle:init.lua,57
...

I suspect some coroutine issue, but I do not know enough about the expected behavior of the libraries to debug this.

Again, here is an example that reproduces the issue:

local weblit = require('weblit')

weblit.app.bind {host = '127.0.0.1', port = 1337}

.use(require('weblit-auto-headers'))

.route({
    method = "DELETE",
    path = "/foo",
}, function (req, res)
    if req.query and req.query.q == '1' then
        res.headers['Connection'] = 'Close'
    end
    res.code = 204
    res.body = nil
end)

.start()
local http = require('coro-http')

coroutine.wrap(function()

    debug.sethook(function(_, line)
        local source = debug.getinfo(2, 'S').source:match('([^/\\]+)$')
        io.write(source, ',', line, '\n') -- luvit print fails after a certain point
    end, 'l')

    -- p(http.request("DELETE", "http://127.0.0.1:1337/foo")) -- good request
    p(http.request("DELETE", "http://127.0.0.1:1337/foo?q=1")) -- bad request

end)()

Also interesting: If you try to print debug info with print, it fails after a certain point. io.write continues to work.

SinisterRectus commented 7 years ago

Okay, I've just managed to isolate the issue to an extent.

In coro-channel:

local function wait()
  local thread = coroutine.running()
  return function (err)
    assert(coroutine.resume(thread, err))
  end
end

if chunk == nil then
  closer.written = true
  closer.check()
  socket:shutdown(wait())
  return coroutine.yield()
end

The coroutine seems to never be resumed on socket shutdown; the wait closure is never called.

SinisterRectus commented 7 years ago

I made a PR to address the handling of the shutdown. In this case, the specific error is EPIPE: broken pipe.