Make fsync frequency configurable for log file

Webdis calls fsync after every single log message, which had a significant negative impact on performance. This change introduces 3 config options for fsync: no explicit fsync (the new default), a periodic fsync called every N milliseconds, or the old behavior.

The new config key is also documented and validates its inputs.

@nicolasff I modeled the different options after Cassandra as you suggested, this should be flexible enough for everyone. I changed the default to never explicitly fsync, which is a change from the current behavior but seems to me like a better default value. Let me know what you think. I also made it use Libevent rather than a separate thread for simplicity.

Here are the numbers I got on my machine with ab calling PING 1 million times with 200 clients (Gist with details):

With "auto": 92,456 requests/sec, 95th percentile 3ms, 99th percentile 3ms, longest 28ms
With 50: 91,490 requests/sec, 95th percentile 3ms, 99th percentile 3ms, longest 29ms
With "all": 20,480 requests/sec, 95th percentile 13ms, 99th percentile 15ms, longest 52ms

Even with 10 the numbers were similar to 50 or 100, it's really when the file is fsync'd after each write that the impact is significant.

Thanks for the PR! The numbers are pretty clear, I'm surprised this hadn't been reported before.

I like the idea of using libevent for the timer since (as far as I remember) things like timer_settime aren't as easily portable. That said there is one thing I find potentially concerning with this approach. I remember when Salvatore explored the idea of calling fsync from a different thread to persist Redis' AOF file, and found that calls to write were taking a long time while the fsync operation was in progress, even from a different thread. I wonder if in this case this could have an impact not just on logging but on the whole event loop, affecting other descriptors like the client sockets and the connection pools. Isn't that the case here, with the callback function from libevent taking potentially a long time in fsync with the effect that all other operations in the event loop are now blocked on it? Of course the current behavior is worse than this, but it seems worth investigating whether this doesn't have some undesirable side effect. If it turns out to affect the event loop, I could see at least two options:

Keep using libevent for the timer but use the callback only to wake up a thread that will perform the fsync.
Try using a different event base for the timer event. Presumably this would also require a separate thread to run event_base_dispatch, and then the slow fsync timer callback would only block this other event base without affecting the main one (unless the fsync also blocks all the other threads that attempt to write to the log).

As discovered by Salvatore this can be tricky and this PR provides a good opportunity to get the design right. If I could just suggest one thing to make it easier to test: you can add a busy loop that just counts to a few million after calling fsync to induce extra latency. If you have this plus say an fprintf at the start of each libevent callback, it should make it easier to see whether those get scheduled or not while the slowed-down fsync is still running.

Thanks once again, it's a great start and the numbers really show that it will make a huge difference.

Sorry for the silence, I looked more closely at the consequences of having a a slow(-ish) system call from a libevent callback since I was worried that it might block all other I/O operations for the whole process. I didn't remember this about the design, but there are actually multiple event loops active concurrently in Webdis:

The "server" event loop waits for connections on port 7379, and in this PR it also manages the fsync timer.
Each "worker" has an event loop too ("threads" configures the number of workers). Each one of these event loops manages the client connections it was given by the server, as well as the connections to Redis in its own dedicated pool.

The consequence here of having fsync block for a little while is that new external connections can't be accepted while this callback is running. This an obvious drawback, but when fsync is flushing data this also blocks all the threads that attempt to log to the same file descriptor, even if they're not running in the same event loop. Each call to fsync has a broad impact.

If we consider these different designs/options:

Design	Worker threads	Server thread
Current	Slow calls to `slog`+`fsync`	No impact
This PR with `N`	Slow calls to `slog` when `fsync` runs	No new connections when `fsync` runs
This PR with `"auto"`	No impact unless I/O device can't keep up	No impact
Async logging	No impact when `fsync` runs^*	No impact when `fsync` runs^*

^* as long as the buffer isn't full

then this PR clearly brings a significant improvement, especially with the default behavior. I think we can take it as submitted, and if you want to talk about async logging I'd be interested to hear your thoughts on how this could be implemented and whether it would make a difference (or rather if it makes enough of a difference that it's worth the significant change).

nicolasff / webdis

Make fsync frequency configurable for log file #187