mtrudel / bandit

Bandit is a pure Elixir HTTP server for Plug & WebSock applications
MIT License
1.7k stars 86 forks source link

WebSocket Connection Limit #393

Closed michaelst closed 3 months ago

michaelst commented 3 months ago

I appear to be running into a limit of about 130k connections. Once getting into that range the server stops responding to requests with nothing in the logs.

Here are some metrics showing two nodes running. The vm args are set to 1M for ports/processes. Is there potentially another limit we are hitting or a limitation in bandit?

image

application.ex

{Bandit, plug: SocketWeb.Router, scheme: :http, port: Application.fetch_env!(:socket, :port)},

router

defmodule SocketWeb.Router do
  use Plug.Router

  plug :match
  plug :dispatch

  get "/_health" do
    send_resp(conn, 200, "ok")
  end

  get "/v2/device" do
    with ["Bearer " <> token] <- get_req_header(conn, "authorization"),
         {:ok, claims} <- Guardian.decode_and_verify(token) do
      WebSockAdapter.upgrade(conn, SocketWeb.Connection, {conn, claims}, timeout: 60_000)
    else
      _error ->
        send_resp(conn, 401, "Unauthorized")
    end
  end

  match _ do
    send_resp(conn, 404, "not found")
  end
end

mix.lock


  "bandit": {:hex, :bandit, "1.5.7", "6856b1e1df4f2b0cb3df1377eab7891bec2da6a7fd69dc78594ad3e152363a50", [:mix], [{:hpax, "~> 1.0.0", [hex: :hpax, repo: "hexpm", optional: false]}, {:plug, "~> 1.14", [hex: :plug, repo: "hexpm", optional: false]}, {:telemetry, "~> 0.4 or ~> 1.0", [hex: :telemetry, repo: "hexpm", optional: false]}, {:thousand_island, "~> 1.0", [hex: :thousand_island, repo: "hexpm", optional: false]}, {:websock, "~> 0.5", [hex: :websock, repo: "hexpm", optional: false]}], "hexpm", "f2dd92ae87d2cbea2fa9aa1652db157b6cba6c405cb44d4f6dd87abba41371cd"},
  "thousand_island": {:hex, :thousand_island, "1.3.5", "6022b6338f1635b3d32406ff98d68b843ba73b3aa95cfc27154223244f3a6ca5", [:mix], [{:telemetry, "~> 0.4 or ~> 1.0", [hex: :telemetry, repo: "hexpm", optional: false]}], "hexpm", "2be6954916fdfe4756af3239fb6b6d75d0b8063b5df03ba76fd8a4c87849e180"},
  "websock": {:hex, :websock, "0.5.3", "2f69a6ebe810328555b6fe5c831a851f485e303a7c8ce6c5f675abeb20ebdadc", [:mix], [], "hexpm", "6105453d7fac22c712ad66fab1d45abdf049868f253cf719b625151460b8b453"},
  "websock_adapter": {:hex, :websock_adapter, "0.5.7", "65fa74042530064ef0570b75b43f5c49bb8b235d6515671b3d250022cb8a1f9e", [:mix], [{:bandit, ">= 0.6.0", [hex: :bandit, repo: "hexpm", optional: true]}, {:plug, "~> 1.14", [hex: :plug, repo: "hexpm", optional: false]}, {:plug_cowboy, "~> 2.6", [hex: :plug_cowboy, repo: "hexpm", optional: true]}, {:websock, "~> 0.5", [hex: :websock, repo: "hexpm", optional: false]}], "hexpm", "d0f478ee64deddfec64b800673fd6e0c8888b079d9f3444dd96d2a98383bdbd1"},
michaelst commented 3 months ago

I did some initial testing with cowboy and did not run into the same issue. Happy to help provide whatever other info that would help look into this.

michaelst commented 3 months ago

actually turned out this appears to be a networking issue that somehow I didn't keep consistent between the tests. I apologize for the false issue

mtrudel commented 3 months ago

No worries! Thanks for the note regardless!

mtrudel commented 3 months ago

Out of curiosity, what sort of numbers are you seeing out of Bandit once you made things consistent?

michaelst commented 3 months ago

I was seeing 9.95GB of memory (reported by beam) for 260k connections (we are running in k8s and this was the same limit I ended up running into with cowboy as well, with network policies enabled this is cut to 130k for some reason). However, the k8s pod was reporting something around 14GB I think, don't have historical data captured on that.

Any numbers in specific that would be useful, the next load test I run I can capture more details