Closed craigfe closed 3 years ago
What's the mutex impact on the performance? Wouldn't it be better to have a scratch buffer by thread?
@samoht: I took some measurements of that :-) Yes, that mutex has a ~5% performance impact on very write-intensive workloads like the replay benchmarks when the log size is large (but I couldn't see any impact for other workloads).
I've changed the implementation to allocate new buffers on entry to Log_file
. It might be better to do things per-thread indeed, but we'd need an LRU to avoid leaking memory there. It's not sufficient to just keep separate threads for the writer + merge thread, because readers in different threads can contend for the buffer as well.
https://github.com/mirage/index/pull/355 introduced a small optimisation to re-use a "local" scratch buffer when decoding values from the log file.
This is actually unsafe: during the merge, the asynchronous merge thread and the main writer thread can attempt concurrent reads from the log file, causing contention over the scratch buffer. This can be observed by inserting
Thread.yield
calls inside theValue.decode
implementation and then stress-testing the interface (e.g. by running the replay benchmarks).This commit ensures that each call to a
Log_file
function gets its own scratch buffer, ensuring safe concurrent access without introducing potential contention issues._Note: I initially tried to just wrap
t.scratch_buf
in a mutex (https://github.com/mirage/index/pull/358/commits/0a7b05c85339943a02a9fc804976fdfa878999a5), but this was a ~5% performance regression in the replay benchmarks due to contention over the lock. The alternative of just allocating on entry toLog_file
has a relatively small cost._