mtrudel / bandit

Bandit is a pure Elixir HTTP server for Plug & WebSock applications
MIT License
1.7k stars 85 forks source link

Request keeps running after client hangs up #408

Closed joladev closed 1 week ago

joladev commented 1 month ago

We're experimenting with switching over to Bandit but we're struggling with this issue where if a long running request is closed by the GCP load balancer, and the client gets a 502 response, the Bandit/Thousand Island request process is still running and executing the code. Once it reaches the end and tries to send the response, it errors with Bandit.HTTPTransport.Bandit.HTTP1.Socket.request_error!/2: closed.

The expected behavior would be for the code to stop running the moment the load balancer or the client hang up, and that's also the behavior we're seeing from cowboy

I am not able to reproduce it locally without the load balancer, so I'm guessing it's related to keep alive. I'd be happy to provide more information or try out a patch

joladev commented 1 month ago

To give an update on this, I can replicate this running locally so I guess the load balancer was a red herring. Made a repo to reproduce it, just a phoenix template and some logs and Process.sleep. Instructions in the readme but basically start a request and then give up on the client side, you'll see the server side continue working even though there's no one there to get the response anymore.

https://github.com/joladev/hangups

mtrudel commented 1 week ago

For better or worse, this is endemic to Bandit's design. Because Bandit uses a single process to handle both the network-facing and Plug-facing aspects of a connection in HTTP/1.1, the process is unable to be notified asynchronously about a socket closure for the duration of a Plug call since that singular process is busy in Plug.call/2. As a result, your code will only be 'notified' of a client disconnection whenever you make calls to read or write to the client via thePlug.Conn family of functions. These will end up making calls to ThousandIsland.Socket functions eventually, which will return {:error, :closed} tuples that will bubble out as Bandit.HTTPError exceptions. Taken as a whole this is a reasonable approach for cases where you're frequently interacting with the client. Process.sleep/1 calls as in your example are the polar opposite; there's no way for Bandit to interrupt those because the only process involved in that connection is the one that's sleeping.

It's also worth noting that client hangups during the parts of the connection that Bandit controls (header reading & keepalive waits for example) do terminate the process immediately. The gap here is strictly limited to the time that the handler process' call stack is 'inside' your Plug.call/2 function.

Cowboy gets around this by having the network facing process spawn a separate process for every request (and thus Plug call) within the connection. As a result, the network facing process is free to receive and act on the :tcp_closed message that gets sent via the TCP stack at any point during a Plug's execution, and so can Process.exit the Plug process at anytime.

Note that all of the above applies only to HTTP/1.1. The HTTP/2 connection model inherently admits having separate network-facing and Plug-facing processes by virtue of its multiplexed nature. Bandit and Cowboy's process models are much closer to one another for HTTP/2 as a result.

Again, all of this is 'by design'. Nothing in the Plug abstraction mandates anything at all about the process models underlying Plug.call/2 calls, so it's hard to argue that either approach is 'more correct' than the other. Bandit is/was deliberately designed this way for simplicity and speed.

All this being said, it's a reasonable feature to be able to immediately terminate the handler process on client hangup. As it turns out, however, it's not possible to realize it without the '1 process per request' model (I've spent a TON of time with socket monitors exit trapping and the like and none of them are quite right for purpose here). I'm exploring the ability to make this an option in Bandit but it's likely going to be part of a much larger refactor within Bandit and Thousand Island. When this lands is still an open question.

I hope this clears things up. Thanks for the issue! In the meantime I'll close this for hygiene. If you have any further ideas or issues, please feel free to (re)open!