mtrudel / bandit

Bandit is a pure Elixir HTTP server for Plug & WebSock applications
MIT License
1.67k stars 80 forks source link

Build iolist for WebSocket frame data #390

Closed liamwhite closed 1 month ago

liamwhite commented 1 month ago

Fixes #389

This is a significant refactor of the initial proposed solution, but the core idea is the same: when parsing a frame, gather enough bytes to parse the header, and then start receiving the payload data into an iolist until the number of payload bytes is satisfied.

liamwhite commented 1 month ago

Note I am currently getting a spurious test failure which I haven't been able to narrow down yet:

  1) test s/m/l frame sizes over-sized frames are rejected (WebSocketProtocolTest)
     test/bandit/websocket/protocol_test.exs:110
     ** (MatchError) no match of right hand side value: {:error, :closed}
     code: capture_log(fn ->
     stacktrace:
       (bandit 1.5.7) test/support/simple_websocket_client.ex:94: SimpleWebSocketClient.recv_frame/1
       (bandit 1.5.7) test/support/simple_websocket_client.ex:79: SimpleWebSocketClient.recv_connection_close_frame/1
       test/bandit/websocket/protocol_test.exs:119: anonymous fn/1 in WebSocketProtocolTest."test s/m/l frame sizes over-sized frames are rejected"/1
       (ex_unit 1.15.7) lib/ex_unit/capture_log.ex:113: ExUnit.CaptureLog.with_log/2
       (ex_unit 1.15.7) lib/ex_unit/capture_log.ex:75: ExUnit.CaptureLog.capture_log/2
       test/bandit/websocket/protocol_test.exs:112: (test)
mtrudel commented 1 month ago

I love this! Do you think it would be possible / sensible to keep the extractor logic outside of WebSocket (ie: Bandit.Extractor) so that we could reuse it for HTTP/2 frame parsing as well? You've already done most of the work here to keep the WS stuff specific in Bandit.WebSocket.Frame, so we'd just need to pass in that module as a behaviour to extractor. We wouldn't need to do the work to get HTTP/2 using it in this PR (though we could if you're up for it!); the overall idea is identical although the header formats are obviously a bit different

liamwhite commented 1 month ago

I cannot figure out what is causing the tests to fail. When I add a warning to one of the areas I suspected the test failure completely stops and I am unable to get it to happen unless I remove it, but sometimes it doesn't happen even without this.

--- a/lib/bandit/websocket/handler.ex
+++ b/lib/bandit/websocket/handler.ex
@@ -56,6 +56,7 @@ defmodule Bandit.WebSocket.Handler do
         end

       {extractor, {:error, reason}} ->
+        IO.warn inspect(reason)
         {:error, {:deserializing, reason}, %{state | extractor: extractor}}

       {extractor, :more} ->

I also tried disabling the async runs and just running the one failing test but cannot get anything to behave consistently. Could you have a look at this?

liamwhite commented 1 month ago

I guess it shouldn't really have been a surprise that the test that broke depended on certain performance characteristics surrounding sending massive frames to the server. The test is broken in two ways, and both are race conditions:

  1. It needs to wait for the logging backend to emit the log message
  2. It expects a connection close frame to be delivered by the transport

It tries to ensure 1 happens by adding a Process.sleep(100) to try to let the logging backend catch up, but this is pretty unreliable. And then the major speedup created by this PR ensures that 2 will almost never happen, because a gen_tcp socket in passive mode will not return unread bytes on a socket which has been closed for reading; instead, it will return {error, closed} once the socket is closed and discard the unread bytes.

defmodule Test do
  def listen do
    {:ok, listener} = :gen_tcp.listen(5678, [:binary])
    {:ok, socket} = :gen_tcp.accept(listener)
    :gen_tcp.send(socket, "hello world")
    :gen_tcp.shutdown(socket, :write)
    Process.sleep(2000)
  end

  def connect do
    {:ok, socket} = :gen_tcp.connect(~c"localhost", 5678, [:binary])
    Process.sleep(1000)
    {:ok, "hello world"} = :gen_tcp.recv(socket, 1460)
  end
end

_p1 = spawn(fn -> Test.listen() end)
Process.sleep(100)
_p2 = spawn(fn -> Test.connect() end)
Process.sleep(2000)

This is arguably an OTP bug?

Anyway, please make a separate fix for these issues or push to my branch because I can't justify spending any more time on this test issue.

mtrudel commented 1 month ago

Agreed that it's pretty racy (a good chunk of the test suite ~unavoidably is). We can skip those tests for now to get this green

mtrudel commented 1 month ago

Also just a heads up that I'll be pretty unresponsive until early next week (on vacation). I'll be able to spend some proper time with this issue early next week.

mtrudel commented 1 month ago

Merged! Thanks for the PR!