mtrudel / bandit

Bandit is a pure Elixir HTTP server for Plug & WebSock applications
MIT License
1.7k stars 85 forks source link

Basic Bandit example in Documentation does not work - gives error of unclear significance. Solution? #404

Closed jonmdev closed 1 week ago

jonmdev commented 1 month ago

If you make a project as per:

https://hexdocs.pm/plug/readme.html https://hexdocs.pm/bandit/Bandit.html

You can get:

#====================
#PLUG MODULE FOR BANDIT 
#====================
defmodule MyPlug do
  import Plug.Conn

  def init(options) do
    # initialize options
    options
  end

  def call(conn, _opts) do
    conn
    |> put_resp_content_type("text/plain")
    |> send_resp(200, "Hello world")
  end
end

#====================
#MAIN MODULE
#====================
defmodule ServerTest do

  def start_http_server do
    IO.puts("Try Start HTTP Server");
    webserver = {Bandit, plug: MyPlug, scheme: :http, port: 4000 }
    supervisorOpts = [strategy: :one_for_one ]
    {:ok, _} = Supervisor.start_link([webserver], supervisorOpts)
    IO.puts("Plug now running on localhost:4000");
    #Process.sleep(:infinity)
  end

end

If you add this to a project and then run ServerTest.start_http_server(), then go localhost:4000 you will get a “Hello world” back. This is fine. However, you will also get this:

21:11:37.278 [error] GenServer #PID<0.554.0> terminating ** (stop) :econnaborted Last message: {:tcp_error, #Port<0.3>, :econnaborted} State: {%ThousandIsland.Socket{socket: #Port<0.3>, transport_module: ThousandIsland.Transports.TCP, read_timeout: 60000, silent_terminate_on_error: false, span: %ThousandIsland.Telemetry{span_name: :connection, telemetry_span_context: #Reference<0.267163659.1006895106.157280>, start_time: -5764607406509056, start_metadata: %{telemetry_span_context: #Reference<0.267163659.1006895106.157280>, parent_telemetry_span_context: #Reference<0.267163659.1006895106.157220>, remote_address: {127, 0, 0, 1}, remote_port: 63311}}}, %{opts: %{http: [], http_1: [], http_2: [], websocket: []}, plug: {MyPlug, []}, handler_module: Bandit.HTTP1.Handler, http_1_enabled: true, http_2_enabled: true, requests_processed: 2}}

21:13:32.629 [error] ** (Bandit.HTTPError) Header read socket error: :closed

I can find no information on what this means. With such simple code I am not creating any GenServer myself so this is obviously a GenServer that is running inside the Bandit code.

Is this a bug or is there something missing in the documentation to avoid this? What is the fix?

mtrudel commented 1 month ago

There's nothing wrong with how you've set up Bandit; it's running and serving your plug just fine. The example is correct.

The issue you're seeing is an error indicating that Bandit errored attempting to read headers from your client. Typically this is a client-side or network issue, where the complete client request does not make it to the server (hence the error). The reality of network facing logic is that there's always remote issues of one type or another; you'll never be rid of them entirely.

That being said, this does suggest a misbehaving client of one type or another. What client were you using?

jonmdev commented 1 month ago

I have run this as follows:

1) Run iex.bat -S mix from powershell in Windows 11 2) In Firefox on same computer/OS running, go to localhost:4000.

I note the errors are not related. There are two:

EConnAborted

17:26:22.812 [error] GenServer #PID<0.545.0> terminating ** (stop) :econnaborted Last message: {:tcp_error, #Port<0.5>, :econnaborted} State: {%ThousandIsland.Socket{socket: #Port<0.5>, transport_module: ThousandIsland.Transports.TCP, read_timeout: 60000, silent_terminate_on_error: false, span: %ThousandIsland.Telemetry{span_name: :connection, telemetry_span_context: #Reference<0.2506072894.74711041.114195>, start_time: -5764604723200000, start_metadata: %{telemetry_span_context: #Reference<0.2506072894.74711041.114195>, parent_telemetry_span_context: #Reference<0.2506072894.74711045.114199>, remote_address: {127, 0, 0, 1}, remote_port: 51484}}}, %{opts: %{http: [], http_1: [], http_2: [], websocket: []}, plug: {MyPlug, []}, handler_module: Bandit.HTTP1.Handler, http_1_enabled: true, http_2_enabled: true, requests_processed: 2}}

Header Timeout

17:10:21.466 [error] ** (Bandit.HTTPError) Header read timeout

These do not occur at the same time, but randomly. I so far cannot identify what is causing them or how to trigger them reliably. Sometimes I get them, sometimes I don't. But they keep happening.

Is there any way to debug further the cause or handle these errors in some way?

I can appreciate that perhaps things may never be perfect, but this is the simplest proof of concept for the Bandit server system I can imagine and I am already being spammed with errors, which is not ideal.

I have also tested as per the documents WebSocket system which provides the same errors:

#====================
#HTTP SOCKET ROUTER
#====================
defmodule Router do
  use Plug.Router

  plug Plug.Logger
  plug :match
  plug :dispatch

  #===============================
  # instructions for html view
  #===============================
  get "/" do
    send_resp(conn, 200, """

    Use the JavaScript console to interact using websockets (shift F12)

    sock  = new WebSocket("ws://localhost:4001/websocket");
    sock.addEventListener("message", (event) => console.log(event)); 
    sock.addEventListener("open", () => sock.send("ping"));
    sock.send("ping")

    """)
  end

  #===============================
  # websocket request
  #===============================
  get "/websocket" do
    conn
    |> WebSockAdapter.upgrade(EchoServer, [], timeout: 60_000)
    |> halt()
  end

  match _ do
    send_resp(conn, 404, "not found")
  end
end

#//=====================
#//HTTP SOCKET SERVER //https://hexdocs.pm/websock/0.5.3/WebSock.html
#//=====================
defmodule EchoServer do
  def init(options) do
    {:ok, options}
  end

  #handle_in function //WebSock will call the configured handler's WebSock.handle_in/2 callback whenever data is received from the client
  def handle_in({message_in, [opcode: :text]}, state) do

    IO.puts("RECEIVED SOMETHING");
    case message_in do

      "ping" ->
        IO.puts("PING IN REQUEST RECEIVED");
        {:reply, :ok, {:text, "pong"}, state}

      _ ->
        IO.puts("RECEIVED: " <> to_string(message_in));
        {:reply, :ok, {:text, to_string(message_in)}, state}
    end

  end
end

#===============
#//MAIN FUNCTION
#===============
defmodule ServerTest do

  def start_web_socket do

    webserver = {Bandit, plug: Router, scheme: :http, port: 4001}
    supervisorOpts = [strategy: :one_for_one ]
    {:ok, _} = Supervisor.start_link([webserver], supervisorOpts)

  end
end

Is there some way to gather more information about these errors? It seems they are all being processed and outputted in your code, not anything I have access to. Is that correct? Or can I do something with handle_info or something to catch them or see what is happening or suppress them or what?

Thanks for any help.

jonmdev commented 1 month ago

I tested the code with Cowboy instead. So far, since switching to Cowboy I have seen neither of these two errors again.

Thus I do believe this is Bandit bug. I would prefer to use Bandit, because as you say it is more efficient. But error free basic function is necessary as well. If I am getting errors from the most ideal simple basic test on a single computer, this does not bode well for a large scale application.

I would be very interested in any fix if you can think of anything to try.

Bug Project

I posted a bug project here:

https://github.com/jonmdev/bandit_bug

In Application.ex you can comment in or out the Bandit or Cowboy version:

        #httpServer = {Plug.Cowboy, plug: MyPlug, scheme: :http, port: 4000 }
        httpServer = {Bandit, plug: MyPlug, scheme: :http, port: 4000 }
        #webSocket = {Plug.Cowboy, plug: Router, scheme: :http, port: 4001}
        webSocket = {Bandit, plug: Router, scheme: :http, port: 4001}

Seems to work smoothly and no errors for either with Cowboy. While Bandit recurrently (though unpredictably) provides the two errors in this thread.

Reproduction Steps

1) In Windows 11, open powershell to project directory with elixir/erlang installed 2) Run iex.bat -S mix 3) Navigate to localhost:4000 and/or localhost:4001 in Firefox 4) Refresh these web browser pages a few times, leave idle for a while, repeat 5) Randomly get errors in Powershell as shown in this thread when using Bandit, not Cowboy 6) Control+C from Powershell to end program run, then run again if no errors (unpredictable and unclear what trigger is, comes and goes)

Thanks for any help or thoughts.

mtrudel commented 1 month ago

You may also be experiencing the same problem as https://github.com/mtrudel/bandit/issues/407, insofar as you may not be rebinding conn everywhere. This will fail silently in Cowboy but produce errors similar to what you're seeing

jonmdev commented 1 month ago

You may also be experiencing the same problem as #407, insofar as you may not be rebinding conn everywhere. This will fail silently in Cowboy but produce errors similar to what you're seeing

Can you explain more what you mean by this? I have reviewed the linked threads including:

https://github.com/mtrudel/bandit/issues/361

However, I do not know what you mean when you say "rebind conn" or "everywhere".

I posted a bug project here:

https://github.com/jonmdev/bandit_bug

The entire project is just these two files which are based on the documentation code:

This is directly copied as you can see from the official documentation:

Is there a problem with the documentation code or my copy/paste from it? Can you explain what? Thanks for any further clarification.

If the documentation is wrong, presumably this should then also be fixed? Thanks.

mtrudel commented 1 month ago

Thanks for the repro - it's not the same issue at #407, so I guess it stands.

mtrudel commented 1 week ago

I'm unable to reproduce this on macOS or Linux, with a variety of clients. Given how transparently simple and correct your example code is, if this was a general bug we'd be hearing about it constantly.

I'm going to close this until / unless we can get a more reproducible repro case