nicolasff / webdis

A Redis HTTP interface with JSON output
https://webd.is
BSD 2-Clause "Simplified" License
2.82k stars 307 forks source link

Make fsync frequency configurable for log file #187

Closed jessie-murray closed 3 years ago

jessie-murray commented 3 years ago

Webdis calls fsync after every single log message, which had a significant negative impact on performance. This change introduces 3 config options for fsync: no explicit fsync (the new default), a periodic fsync called every N milliseconds, or the old behavior.

The new config key is also documented and validates its inputs.

@nicolasff I modeled the different options after Cassandra as you suggested, this should be flexible enough for everyone. I changed the default to never explicitly fsync, which is a change from the current behavior but seems to me like a better default value. Let me know what you think. I also made it use Libevent rather than a separate thread for simplicity.

Here are the numbers I got on my machine with ab calling PING 1 million times with 200 clients (Gist with details):

Even with 10 the numbers were similar to 50 or 100, it's really when the file is fsync'd after each write that the impact is significant.

nicolasff commented 3 years ago

Thanks for the PR! The numbers are pretty clear, I'm surprised this hadn't been reported before.

I like the idea of using libevent for the timer since (as far as I remember) things like timer_settime aren't as easily portable. That said there is one thing I find potentially concerning with this approach. I remember when Salvatore explored the idea of calling fsync from a different thread to persist Redis' AOF file, and found that calls to write were taking a long time while the fsync operation was in progress, even from a different thread. I wonder if in this case this could have an impact not just on logging but on the whole event loop, affecting other descriptors like the client sockets and the connection pools. Isn't that the case here, with the callback function from libevent taking potentially a long time in fsync with the effect that all other operations in the event loop are now blocked on it? Of course the current behavior is worse than this, but it seems worth investigating whether this doesn't have some undesirable side effect. If it turns out to affect the event loop, I could see at least two options:

  1. Keep using libevent for the timer but use the callback only to wake up a thread that will perform the fsync.
  2. Try using a different event base for the timer event. Presumably this would also require a separate thread to run event_base_dispatch, and then the slow fsync timer callback would only block this other event base without affecting the main one (unless the fsync also blocks all the other threads that attempt to write to the log).

As discovered by Salvatore this can be tricky and this PR provides a good opportunity to get the design right. If I could just suggest one thing to make it easier to test: you can add a busy loop that just counts to a few million after calling fsync to induce extra latency. If you have this plus say an fprintf at the start of each libevent callback, it should make it easier to see whether those get scheduled or not while the slowed-down fsync is still running.

Thanks once again, it's a great start and the numbers really show that it will make a huge difference.

nicolasff commented 3 years ago

Sorry for the silence, I looked more closely at the consequences of having a a slow(-ish) system call from a libevent callback since I was worried that it might block all other I/O operations for the whole process. I didn't remember this about the design, but there are actually multiple event loops active concurrently in Webdis:

The consequence here of having fsync block for a little while is that new external connections can't be accepted while this callback is running. This an obvious drawback, but when fsync is flushing data this also blocks all the threads that attempt to log to the same file descriptor, even if they're not running in the same event loop. Each call to fsync has a broad impact.

If we consider these different designs/options:

Design Worker threads Server thread
Current Slow calls to slog+fsync No impact
This PR with N Slow calls to slog when fsync runs No new connections when fsync runs
This PR with "auto" No impact unless I/O device can't keep up No impact
Async logging No impact when fsync runs* No impact when fsync runs*

* as long as the buffer isn't full

then this PR clearly brings a significant improvement, especially with the default behavior. I think we can take it as submitted, and if you want to talk about async logging I'd be interested to hear your thoughts on how this could be implemented and whether it would make a difference (or rather if it makes enough of a difference that it's worth the significant change).