Closed talex5 closed 5 months ago
After making removal parallel, things are much better (on Linux):
On one run with eio_posix, it deadlocked. Two systhreads were waiting in st_masterlock_acquire
to get the master lock, even though it wasn't busy!
(gdb) bt
#0 __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0,
futex_word=0x56397346e014 <thread_table+116>) at ./nptl/futex-internal.c:57
#1 __futex_abstimed_wait_common (futex_word=futex_word@entry=0x56397346e014 <thread_table+116>, expected=expected@entry=0,
clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0, cancel=cancel@entry=true) at ./nptl/futex-internal.c:87
#2 0x00007f2c88741e0b in __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x56397346e014 <thread_table+116>,
expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0)
at ./nptl/futex-internal.c:139
#3 0x00007f2c88744468 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x56397346dfb0 <thread_table+16>,
cond=0x56397346dfe8 <thread_table+72>) at ./nptl/pthread_cond_wait.c:503
#4 ___pthread_cond_wait (cond=cond@entry=0x56397346dfe8 <thread_table+72>, mutex=mutex@entry=0x56397346dfb0 <thread_table+16>)
at ./nptl/pthread_cond_wait.c:618
#5 0x00005639732f72e0 in st_masterlock_acquire (m=0x56397346dfa8 <thread_table+8>)
at /home/user/.opam/5.1.1/.opam-switch/build/ocaml-base-compiler.5.1.1/otherlibs/systhreads/st_pthreads.h:159
#6 0x00005639732f7351 in thread_lock_acquire (dom_id=<optimized out>) at st_stubs.c:126
#7 caml_thread_leave_blocking_section () at st_stubs.c:254
#8 0x0000563973321d46 in caml_leave_blocking_section () at runtime/signals.c:171
#9 0x00005639732fa6fb in caml_unix_close (fd=<optimized out>) at close_unix.c:25
(gdb) fr 5
(gdb) p m.waiters
$3 = 2
(gdb) p m.busy
$4 = 0
Will investigate the deadlock separately; it's clearly not the fault of this PR.
I want to make some improvements here. But let's start by benchmarking the current state of things.
Initially, I get:
Now, instead of creating thousands of fibers and having them fight over a semaphore, limit the number of fibers created (this also makes the traces easier to view). Also, fill the files with zero bytes instead of asking the OS for secure random data, since that's slow and isn't useful for the test.
On my machine:
On CI:
Interestingly, removal was a bit slower in all three cases, even though that bit wasn't changed!