mirage / ocaml-cohttp

An OCaml library for HTTP clients and servers using Lwt or Async
Other
712 stars 174 forks source link

cohttp-eio hangs #953

Open craff opened 1 year ago

craff commented 1 year ago

The simplest example hangs if I do more requests than the number of domains. The bug is against latest opam version with ocaml 5.0.0-beta2.

Step to reproduce: dune build dune exec ./bug.exe ./bench.sh http://localhost:8080/ N 0

N should be twice the number of cores (recommended_domain_count)

Attaching a file keep failing on gitbug (sorry github ;-). I put the file inline

(*--------------------------  bug.ml -----------------*)
module S = Cohttp_eio.Server

let () =
  let handler (_request,_buf,_addr) =
    let data = String.make (1024*1024) 'X' in
    S.text_response data
  in
  let port = 8080 in
  let domains = Domain.recommended_domain_count () in
  Eio_main.run @@ fun env -> S.run ~domains ~port env handler
;------------------------------ dune -------------------------
(executable
 (name bug)
 (modules :standard)
 (libraries eio eio_main cohttp-eio))
#------------------------ bench.sh --------------------------
#!/bin/bash

url=$1
nb=$2
sleep_time=$3

for (( c=1; c<=$nb; c++ )); do
  f=$(mktemp)
  (curl -s $url > $f; stat -c %s $f; rm $f) &
  sleep $sleep_time
done

wait $(jobs -p)
mseri commented 1 year ago

Thanks for the report and reproduction mteril. Ping @bikallem

bikallem commented 1 year ago

Ack.

bikallem commented 1 year ago

@craff I am unable to reproduce the error (the hang). In my 2 core PC.

I modified bench.sh to be able to run in NixOS (/usr/bin/env bash) and print the temp file.

#!/usr/bin/env bash

url=$1
nb=$2
sleep_time=$3

for (( c=1; c<=$nb; c++ )); do
  f=$(mktemp)
  echo $f
  (curl -s $url > $f; stat -c %s $f)
  sleep $sleep_time
done

wait $(jobs -p)
$ ./bench.sh localhost:8080 3 0
/tmp/nix-shell.0dwbKW/tmp.y7Ek4nEStT
1048576
/tmp/nix-shell.0dwbKW/tmp.prkvJwpBu4
1048576
/tmp/nix-shell.0dwbKW/tmp.PCpqEqLPzW
1048576

I tried in another with 12 cores.

./bench.sh localhost:8080 24 0
/tmp/nix-shell.0dwbKW/tmp.Sx1uglvlBG
1048576
/tmp/nix-shell.0dwbKW/tmp.ckThvk9Z17
1048576
/tmp/nix-shell.0dwbKW/tmp.laFvnUznHj
1048576
/tmp/nix-shell.0dwbKW/tmp.OHUZTJcTu0
1048576
/tmp/nix-shell.0dwbKW/tmp.foARSG1jAM
1048576
/tmp/nix-shell.0dwbKW/tmp.2gaXXc7NIH
1048576
/tmp/nix-shell.0dwbKW/tmp.lkEiVHbBX3
1048576
/tmp/nix-shell.0dwbKW/tmp.fj5FpWwCkF
1048576
/tmp/nix-shell.0dwbKW/tmp.wIuRCuvufU
1048576
/tmp/nix-shell.0dwbKW/tmp.sPSuXHVdDU
1048576
/tmp/nix-shell.0dwbKW/tmp.mjh359gBud
1048576
/tmp/nix-shell.0dwbKW/tmp.q4KmcAWxbR
1048576
/tmp/nix-shell.0dwbKW/tmp.WpE8JHhE99
1048576
/tmp/nix-shell.0dwbKW/tmp.tZpQlbXQ7v
1048576
/tmp/nix-shell.0dwbKW/tmp.nkJcJQCgM4
1048576
/tmp/nix-shell.0dwbKW/tmp.xx5sW3XxF6
1048576
/tmp/nix-shell.0dwbKW/tmp.W5nOkfUBvV
1048576
/tmp/nix-shell.0dwbKW/tmp.Hh7iSlCT9Y
1048576
/tmp/nix-shell.0dwbKW/tmp.TSpPy18pOH
1048576
/tmp/nix-shell.0dwbKW/tmp.0YDZn0e8zl
1048576
/tmp/nix-shell.0dwbKW/tmp.lkNG24gbyb
1048576
/tmp/nix-shell.0dwbKW/tmp.bbsxRsdZY1
1048576
/tmp/nix-shell.0dwbKW/tmp.BbWdcoLUSL
1048576
/tmp/nix-shell.0dwbKW/tmp.Iv2Na4sh93
1048576
talex5 commented 1 year ago

I see this with EIO_BACKEND=luv. I suspect this is because we use a separate Luv event loop for each domain, but cohttp is sharing the socket between domains. Sockets get attached to an event loop when they are created, not when you wait on them. Possibly we should run a single Luv event loop in a sys-thread and dispatch events to domains from that.

haesbaert commented 1 year ago

I see this with EIO_BACKEND=luv. I suspect this is because we use a separate Luv event loop for each domain, but cohttp is sharing the socket between domains. Sockets get attached to an event loop when they are created, not when you wait on them. Possibly we should run a single Luv event loop in a sys-thread and dispatch events to domains from that.

I assume you mean the listener socket, because if it's the actual connection socket, then everything is wong :).

Ideally each domain would have its own listener socket, bound to the same address and port using SO_REUSEADDR and SO_REUSEPORT, this way they can all call accept on their own descriptors and the kernel should round-robin the connections. Decades ago support for this was fiddly, so the preferred form was each thread blocking on the same accept(listenfd), which is what I assume is being done.

talex5 commented 1 year ago

Ideally each domain would have its own listener socket, bound to the same address and port using SO_REUSEADDR and SO_REUSEPORT

libuv doesn't support SO_REUSEPORT however :-(

Looking a bit closer, I think this is what's happening:

  1. We get notified of a new connection: https://github.com/ocaml-multicore/eio/blob/50abf52dd3fab1565114801d9edd8d7e13dabab0/lib_eio_luv/eio_luv.ml#L796-L800
  2. We signal that a new FD is ready and return from the accept handler.
  3. libuv notices that we didn't take the FD yet and decides we don't want any more. It disables watching the listening socket: https://github.com/libuv/libuv/blob/988f2bfc4defb9a85a536a3e645834c161143ee0/src/unix/stream.c#L569-L576
  4. One of our worker domains accepts the FD, but that's the last one we'll get.

Libuv should resume watching the listening socket when we accept, but I guess it gets confused because the accept is done from a different domain.

bikallem commented 1 year ago

Ideally each domain would have its own listener socket, bound to the same address and port using SO_REUSEADDR and SO_REUSEPORT, this way they can all call accept on their own descriptors and the kernel should round-robin the connections. Decades ago support for this was fiddly,

Slightly off topic: I experimented with this approach thinking it would enhance performance (i.e. we would get free load balancing from linux kernel among many of the CPU cores). The benchmarks however didn't reveal any benefits whatsoever.

I think using SO_REUSEPORT is only usable/beneficial if you have multiple OS processes. It doesn't seem to matter much if you use it in a single process setting.

mseri commented 1 year ago

Should we just remove it then?

bikallem commented 1 year ago

Should we just remove it then?

Yes. The Server.run is due some improvements as discussed in https://github.com/mirage/ocaml-cohttp/issues/961 and this includes removing SO_REUSEPORT usage.

haesbaert commented 1 year ago

That works, but there are many things involved, multiple sockets with reuseaddr/port work in a push manner. The kernel round robins the n XT connection, which means that your next connection might wait if that domain is busy. If it's selected or epoll/level triggered, it also causes thundering storm, as in the "file description" not descriptor is considered readable and everyone wakes up. Scaling all of this requires thinking about all this little cases and how they play a part.

On Thu, Jan 12, 2023, 14:41 Bikal Lem @.***> wrote:

Ideally each domain would have its own listener socket, bound to the same address and port using SO_REUSEADDR and SO_REUSEPORT, this way they can all call accept on their own descriptors and the kernel should round-robin the connections. Decades ago support for this was fiddly,

Slightly off topic: I experimented with this approach thinking it would enhance performance (i.e. we would get free load balancing from linux kernel among many of the CPU cores). The benchmarks however didn't reveal any benefits whatsoever.

I think using SO_REUSEPORT is only usable/beneficial if you have multiple OS processes. It doesn't seem to matter much if you use it in a single process setting.

— Reply to this email directly, view it on GitHub https://github.com/mirage/ocaml-cohttp/issues/953#issuecomment-1380373287, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABR2EFJSJUFCK6HJSML65LWSACX3ANCNFSM6AAAAAASYHI4DM . You are receiving this because you commented.Message ID: @.***>

talex5 commented 1 year ago

Eio 0.9 added a new "eio_posix" backend, which should work correctly. I tested it (using EIO_BACKEND=posix) and it worked for me. The luv backend is now only used on Windows, and will be removed completely soon.