Closed Bilal2453 closed 3 years ago
Good to mention, this basically affect all libraries that make use of http-codec
This looks promising.
I'm wondering how this caused the problem in the first place. I thought coro-http instances were only used for a single TCP stream. As far as I understand the spec, you can't pipeline HTTP requests over a single TCP connection when sending raw bodies because there is no way to know when the body ends. Does it freeze for only that TCP socket or for all requests in the server?
Does it freeze for only that TCP socket or for all requests in the server?
It does indeed freeze for all TCP sockets in the said server, in fact, if right this moment, I make a request to luvit.io similar to this it will have a similar issue before the corn-job restart it. Easily replicated even on a localserver.
Thanks, that's super weird. I'll see if I can understand this more. I'm glad this patch seems to fix it, but I don't understand why and I'm worried there are other problems still there.
Well, as far as I understand it, it goes like this:
one single decoder instance gets initialized for parsing all HTTP requests on said server:
net.createServer({
host = host,
port = port,
encode = httpCodec.encoder(),
decode = httpCodec.decoder(),
}
local function decoder()
-- This decoder is somewhat stateful with 5 different parsing states. local decodeHead, decodeEmpty, decodeRaw, decodeChunked, decodeCounted local mode -- state variable that points to various decoders local bytesLeft -- For counted decoder
- One of the requests may forces the decoder into going into decodeRaw mode:
```lua
if head.keepAlive and (not (chunkedEncoding or (contentLength and contentLength > 0)))
or (head.method == "GET" or head.method == "HEAD") then
mode = decodeEmpty
elseif chunkedEncoding then
mode = decodeChunked
elseif contentLength then
bytesLeft = contentLength
mode = decodeCounted
elseif not head.keepAlive then
mode = decodeRaw
end
end
Since the decoder state is shared among all TCP sockets, this mode will be used for all of them as long as it doesn't get switched back to decodeHead, which doesn't happen when decodeRaw is called:
function decodeRaw(chunk, index)
if #chunk < index then
return
end
return sub(chunk, index)
end
So pretty much, this only happens because the mode
state is an upvalue among all requests, so it affect everything. Not sure why it is like this, but I assume it is to be able to precisely select how the next request will be handled before you even do? such as handling chunked requests maybe
Yep, I just independently verified (through logging) that the same decode function is used for all connections and http-codec assumes it does not do that! Thanks for finding the code that reuses the same decoder. I'll fix this now!
I honestly think it is fine, reusing the same decoder with its state for all other connections. As long as the states gets updates correctly on each request.
Either ways, decodeRaw
should be switching back to decodeHead
whether it's for a separate instance or not.
No, it's not fine to reuse the same decoder state for multiple connections. This is likely the cause of many bugs that I've only seen in production because of concurrent streams reusing the same decoder. The decoder state is very much tied to a single TCP stream and should not be mixed between streams.
Ah I see, couldn't honestly tell if that was the intended implementation (to use same decoder) or it was a bug until you confirmed that.
Thanks for the help in finally finding the root cause of this issue. I've been meaning to fix it for years!
Please review and test this PR. I believe it fixes the root cause.
Great! Now since this is pretty much solved from its roots, I guess no needs for workarounds!
... I make a request to luvit.io similar to this it will have a similar issue before the cron-job restart it.
I verified this was the case. Then I deployed the updated weblit-server version to luvit.io and now it no longer gets stuck!
That's cool! one more bug down!
One of my friends have been running into this problem for a while now, randomly their Weblit server would just freeze giving random errors, I have traced this all the way to http-codec and noticed that when decodeRaw is used, it never switches back, making all requests after the said request return with a nil, providing
nil
to Weblit'sreq.path
(using the wrapped reader that make use of http-codec decoder), which is expected to always exists, causing a nil value to be indexed.To replicate the above bug you can use the following setup: server.lua:
A server that always responds with 200 - hello. Testing this alone would work just fine as expected. Now send an HTTP request similar to this
Any HTTP request that force the use of decodeRaw mode will do.
Now send any request you like, GET, POST, doesn't matter: Expected behavior: The server returning 200 - hello response. Current Behavior: The server completely freezes not responding to any other request (since it always use decodeRaw now)
This PR should fix this by changing to decodeHead mode when the chunk is consumed.