ocaml-multicore / eio

Effects-based direct-style IO for multicore OCaml
Other
529 stars 67 forks source link

Stat benchmark: report cleanup time and optimise #692

Closed talex5 closed 5 months ago

talex5 commented 5 months ago

I want to make some improvements here. But let's start by benchmarking the current state of things.

Initially, I get:

+Using linux backend
+Running Path.stat...
+Going to create 168420 files and directories
+Created in 2.37 s
+Statted in 1.32 s
+Removed in 5.55 s

+Using posix backend
+Running Path.stat...
+Going to create 168420 files and directories
+Created in 8.61 s
+Statted in 4.10 s
+Removed in 20.41 s

Now, instead of creating thousands of fibers and having them fight over a semaphore, limit the number of fibers created (this also makes the traces easier to view). Also, fill the files with zero bytes instead of asking the OS for secure random data, since that's slow and isn't useful for the test.

On my machine:

+Using linux backend                   
+Running Path.stat...
+Going to create 168420 files and directories
+Created in 1.62 s
+Statted in 1.04 s
+Removed in 8.00 s

+Using posix backend                   
+Running Path.stat...
+Going to create 168420 files and directories
+Created in 2.92 s
+Statted in 3.82 s
+Removed in 22.27 s

On CI:

Screenshot 2024-02-14 at 11-53-31 OCaml Benchmarks

Interestingly, removal was a bit slower in all three cases, even though that bit wasn't changed!

talex5 commented 5 months ago

After making removal parallel, things are much better (on Linux):

Screenshot 2024-02-14 at 12-56-37 OCaml Benchmarks

talex5 commented 5 months ago

On one run with eio_posix, it deadlocked. Two systhreads were waiting in st_masterlock_acquire to get the master lock, even though it wasn't busy!

(gdb) bt
#0  __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, 
    futex_word=0x56397346e014 <thread_table+116>) at ./nptl/futex-internal.c:57
#1  __futex_abstimed_wait_common (futex_word=futex_word@entry=0x56397346e014 <thread_table+116>, expected=expected@entry=0, 
    clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0, cancel=cancel@entry=true) at ./nptl/futex-internal.c:87
#2  0x00007f2c88741e0b in __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x56397346e014 <thread_table+116>, 
    expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0)
    at ./nptl/futex-internal.c:139
#3  0x00007f2c88744468 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x56397346dfb0 <thread_table+16>, 
    cond=0x56397346dfe8 <thread_table+72>) at ./nptl/pthread_cond_wait.c:503
#4  ___pthread_cond_wait (cond=cond@entry=0x56397346dfe8 <thread_table+72>, mutex=mutex@entry=0x56397346dfb0 <thread_table+16>)
    at ./nptl/pthread_cond_wait.c:618
#5  0x00005639732f72e0 in st_masterlock_acquire (m=0x56397346dfa8 <thread_table+8>)
    at /home/user/.opam/5.1.1/.opam-switch/build/ocaml-base-compiler.5.1.1/otherlibs/systhreads/st_pthreads.h:159
#6  0x00005639732f7351 in thread_lock_acquire (dom_id=<optimized out>) at st_stubs.c:126
#7  caml_thread_leave_blocking_section () at st_stubs.c:254
#8  0x0000563973321d46 in caml_leave_blocking_section () at runtime/signals.c:171
#9  0x00005639732fa6fb in caml_unix_close (fd=<optimized out>) at close_unix.c:25

(gdb) fr 5

(gdb) p m.waiters
$3 = 2
(gdb) p m.busy
$4 = 0
talex5 commented 5 months ago

Will investigate the deadlock separately; it's clearly not the fault of this PR.