luvit / lit

Toolkit for developing, sharing, and running luvit/lua programs and libraries.
http://lit.luvit.io/
Apache License 2.0
245 stars 58 forks source link

http-codec: decoder may use decodeRaw and never switch back #285

Closed Bilal2453 closed 3 years ago

Bilal2453 commented 3 years ago

One of my friends have been running into this problem for a while now, randomly their Weblit server would just freeze giving random errors, I have traced this all the way to http-codec and noticed that when decodeRaw is used, it never switches back, making all requests after the said request return with a nil, providing nil to Weblit's req.path (using the wrapped reader that make use of http-codec decoder), which is expected to always exists, causing a nil value to be indexed.

To replicate the above bug you can use the following setup: server.lua:

require('coro-http').createServer("127.0.0.1", 8080, function()
  return {code = 200, reason = "OK", {"Content-Length", "5"}}, "hello"
end)

A server that always responds with 200 - hello. Testing this alone would work just fine as expected. Now send an HTTP request similar to this

OPTIONS / HTTP/1.1
Connection: upgrade
accept-encoding: gzip
user-agent: curl/7.77.0
accept: */*

Any HTTP request that force the use of decodeRaw mode will do.

Now send any request you like, GET, POST, doesn't matter: Expected behavior: The server returning 200 - hello response. Current Behavior: The server completely freezes not responding to any other request (since it always use decodeRaw now)

This PR should fix this by changing to decodeHead mode when the chunk is consumed.

Bilal2453 commented 3 years ago

Good to mention, this basically affect all libraries that make use of http-codec

creationix commented 3 years ago

This looks promising.

creationix commented 3 years ago

I'm wondering how this caused the problem in the first place. I thought coro-http instances were only used for a single TCP stream. As far as I understand the spec, you can't pipeline HTTP requests over a single TCP connection when sending raw bodies because there is no way to know when the body ends. Does it freeze for only that TCP socket or for all requests in the server?

Bilal2453 commented 3 years ago

Does it freeze for only that TCP socket or for all requests in the server?

It does indeed freeze for all TCP sockets in the said server, in fact, if right this moment, I make a request to luvit.io similar to this it will have a similar issue before the corn-job restart it. Easily replicated even on a localserver.

creationix commented 3 years ago

Thanks, that's super weird. I'll see if I can understand this more. I'm glad this patch seems to fix it, but I don't understand why and I'm worried there are other problems still there.

Bilal2453 commented 3 years ago

Well, as far as I understand it, it goes like this:

      function decodeRaw(chunk, index)
    if #chunk < index then
      return
    end
    return sub(chunk, index)
  end

So pretty much, this only happens because the mode state is an upvalue among all requests, so it affect everything. Not sure why it is like this, but I assume it is to be able to precisely select how the next request will be handled before you even do? such as handling chunked requests maybe

creationix commented 3 years ago

Yep, I just independently verified (through logging) that the same decode function is used for all connections and http-codec assumes it does not do that! Thanks for finding the code that reuses the same decoder. I'll fix this now!

Bilal2453 commented 3 years ago

I honestly think it is fine, reusing the same decoder with its state for all other connections. As long as the states gets updates correctly on each request.

Bilal2453 commented 3 years ago

Either ways, decodeRaw should be switching back to decodeHead whether it's for a separate instance or not.

creationix commented 3 years ago

No, it's not fine to reuse the same decoder state for multiple connections. This is likely the cause of many bugs that I've only seen in production because of concurrent streams reusing the same decoder. The decoder state is very much tied to a single TCP stream and should not be mixed between streams.

Bilal2453 commented 3 years ago

Ah I see, couldn't honestly tell if that was the intended implementation (to use same decoder) or it was a bug until you confirmed that.

creationix commented 3 years ago

Thanks for the help in finally finding the root cause of this issue. I've been meaning to fix it for years!

Please review and test this PR. I believe it fixes the root cause.

https://github.com/luvit/lit/pull/288

Bilal2453 commented 3 years ago

Great! Now since this is pretty much solved from its roots, I guess no needs for workarounds!

creationix commented 3 years ago

... I make a request to luvit.io similar to this it will have a similar issue before the cron-job restart it.

I verified this was the case. Then I deployed the updated weblit-server version to luvit.io and now it no longer gets stuck!

Bilal2453 commented 3 years ago

That's cool! one more bug down!