ocaml-multicore / eio

Effects-based direct-style IO for multicore OCaml
Other
550 stars 66 forks source link

Eio vs Lwt_unix networking performance #527

Closed cometkim closed 1 year ago

cometkim commented 1 year ago

I'm practicing Eio by writing a Neo4j driver.

Here is the version negotiation code:

Eio's API is pretty cool. I was curious about its actual performance.

When tested using hyperfine, the eio version consistently appears 3~4 times slower than the lwt_unix version.

  1. docker run -it -p 7687:7687 -p 7444:7444 -p 3000:3000 memgraph/memgraph-platform
  2. Open another shell
  3. hyperfine 'dune exec neo4j-lwt-unix/bin/connect.exe' 'dune exec neo4j-eio/bin/connect.exe'

Tested on:

Result:

Benchmark 1: dune exec neo4j-lwt-unix/bin/connect.exe
  Time (mean ± σ):      20.8 ms ±   6.9 ms    [User: 11.2 ms, System: 6.7 ms]
  Range (min … max):    18.2 ms …  66.4 ms    83 runs

  Warning: The first benchmarking run for this command was significantly slower than the rest (34.2 ms). This could be caused by (filesystem) caches that were not filled until after the first run. You should consider using the '--warmup' option to fill those caches before the actual benchmark. Alternatively, use the '--prepare' option to clear the caches before each timing run.

Benchmark 2: dune exec neo4j-eio/bin/connect.exe
  Time (mean ± σ):      84.6 ms ±  30.4 ms    [User: 32.8 ms, System: 17.5 ms]
  Range (min … max):    28.4 ms … 150.3 ms    38 runs

Summary
  'dune exec neo4j-lwt-unix/bin/connect.exe' ran
    4.07 ± 1.99 times faster than 'dune exec neo4j-eio/bin/connect.exe'

I've tried to do the same as possible. But I don't know how it works behind the scenes.

Do I miss something here? or does it mean it can still be optimized?

talex5 commented 1 year ago

Benchmarking dune exec probably isn't a good idea. I suggest just testing the executable directly.

Might be worth adding a loop in the OCaml code to try many connections - I suspect you're just measuring start-up time. Is this on Linux? Try comparing with EIO_BACKEND=posix to avoid setting up io_uring. If it's still slower, strace -tt should show what's taking the time.

cometkim commented 1 year ago

Ok, let me try... but I got a connection error after switching eio backend

❯ EIO_BACKEND=linux ./_build/default/neo4j-eio/bin/connect.exe
+Client: connecting to server
+Client: trying handshake...
+Version 4.3
+Client: closing connection

❯ EIO_BACKEND=posix ./_build/default/neo4j-eio/bin/connect.exe
+Client: connecting to server
+Client: trying handshake...
Fatal error: exception Eio.Io Net Connection_reset Unix_error (Broken pipe, "writev", "")

Are there compatibility issues I should know for the posix backend?

(I just upgraded eio libraries to latest version)

cometkim commented 1 year ago

I tried to connect 100 times concurrently and got a better result.

Benchmark 1: _build/default/neo4j-lwt-unix/bin/connect_p.exe
  Time (mean ± σ):     110.3 ms ±  43.9 ms    [User: 4.7 ms, System: 16.2 ms]
  Range (min … max):    59.0 ms … 209.4 ms    28 runs

Benchmark 2: _build/default/neo4j-eio/bin/connect_p.exe
  Time (mean ± σ):     128.7 ms ±  39.6 ms    [User: 10.4 ms, System: 15.4 ms]
  Range (min … max):    61.1 ms … 256.9 ms    30 runs

Summary
  '_build/default/neo4j-lwt-unix/bin/connect_p.exe' ran
    1.17 ± 0.59 times faster than '_build/default/neo4j-eio/bin/connect_p.exe'
cometkim commented 1 year ago

There may be other effects such as console output, so I think this is fine.

Just curious is there any special cost to initialize eio env?

talex5 commented 1 year ago

Are there compatibility issues I should know for the posix backend?

Hmm, looks like a bug in Buf_write.flush (it waits to the data to be removed from the buffer, but not for the write itself to complete). As a work-around, you can do this:

let handshake flow =
  traceln "Client: trying handshake...";
  Write.with_flow flow (fun to_server -> Write.bytes to_server Neo4j.Protocol.hello);
  Eio.Flow.shutdown flow `Send;
  ...

with_flow does wait for the writes to complete.

cometkim commented 1 year ago

Hmm, looks like a bug in Buf_write.flush (it waits to the data to be removed from the buffer, but not for the write itself to complete). As a work-around, you can do this:

Good! Now it works on the posix backend as well.

EIO_BACKEND=posix hyperfine '_build/default/neo4j-lwt-unix/bin/connect_p.exe' '_build/default/neo4j-eio/bin/connect_p.exe'

Benchmark 1: _build/default/neo4j-lwt-unix/bin/connect_p.exe
  Time (mean ± σ):     106.7 ms ±  37.8 ms    [User: 5.9 ms, System: 14.5 ms]
  Range (min … max):    49.8 ms … 195.2 ms    30 runs

Benchmark 2: _build/default/neo4j-eio/bin/connect_p.exe
  Time (mean ± σ):     108.4 ms ±  41.3 ms    [User: 9.9 ms, System: 12.7 ms]
  Range (min … max):    54.2 ms … 203.7 ms    34 runs

Summary
  '_build/default/neo4j-lwt-unix/bin/connect_p.exe' ran
    1.03 ± 0.57 times faster than '_build/default/neo4j-eio/bin/connect_p.exe'

It's unreliable since it relies on the network. But It (both posix and linux) is now almost equivalent to lwt.

cometkim commented 1 year ago

Startup time isn't a big deal for me, so closing this.