Dissapointing responsiveness of Redis when using append only file mode

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?
0. Have Redis with append only file mode enabled.
1. 2 scripts heavily reading and writing to and from Redis
2. See scripts 'hang' from time to time, not able to insert/get anything
from Redis.
3. Added some extra timing/logging around synchronous IO calls in redis.c

So the test setup was like this

- 'setter' scripts, continuously inserting/updating set of 500.000 random
keys, random value size [0..4096] bytes. Around 6.000 sets/sec, avg value
size = 2k, total throughput for 'set' operations ~ 12Mb/sec.
- 'getter' script, continuously fetching any of the 500.000 keys set by the
setter script. Also around 6.000 gets/sec avg value size~2k, and throughput
~ 15 Mb/sec.
- Redis server with aof feature enabled and set to fsync every second
(Redis 1.2.2). Total memory usage ~1Gb RSS, host system (dual core ubuntu
linux, still 1gb free).
- a 3rd script issuing the 'BGREWRITEAOF' once every 2 minutes.

Now the observation is that the timings for the get and set commands vary
wildly, e.g. you would expect them to be very fast always due to the
forking nature of Redis and the very fast asynchronous IO with Redis clients. 
This is not the case, operations that are performed mostly in the
millisecond range, can also take 100s of milliseconds, and even more than
10 seconds in the most degenerate cases. Note that these observations apply
to the AOF mode only. I did not test the normal snapshot mode.

I dove into the source of redis.c and found that for the AOF quite some
disk IO is done synchronously within the event loop of the main process.
This causes Redis to block and be unresponsive to clients while doing disk IO.

Some places and timings I found

- the regular AOF Flush every second blocks the Redis process for about
100ms on average. This means that for 100ms out of every second the Redis
process is not responding to clients. Clients doing operations that would
normally take a couple of ms now have to wait for 100ms.
- As soon as the BGREWRITEAOF background flush is started, the parent
process (the regular Redis server) will also continue to write to the
append only file. This causes the disk to alternate between the writing of
the background process and the main server, this causes flush times of up
to 15~20 seconds in the server, blocking it's process and thus causing it
to be unresponsive to clients for this amount of time.

The question is, is this by design?, or are there steps planned to address
this kind of synchronous disk IO for the AOF feature (maybe introduce a
seperate thread in the main process to do the flushing?, and or not write
to the aof file while there is a background process writing the new log
logfile?,

Original issue reported on code.google.com by henkp...@gmail.com on 26 Feb 2010 at 4:44

GoogleCodeExporter commented 9 years ago

Hello, thanks for this interesting and detailed bug report.

Here I think that the easy fix is to avoid the fsync() while there is a rewrite 
ongoing, 
but the *right* fix is probably a thread. There is no asynchronous version of 
fsync() 
so what I could like to do to fix this issue is the following:

create a thread at startup dealing with this stuff. Every second:

open(file)
fsync(it)
close(file)

So that we don't have to coordinate with the main thread. Our common knowledge 
is 
just the file name.

I'll fix it ASAP in Redis master but it's unlikely this will enter 1.2.x soon, 
so for 1.2.x 
the fix could be to avoid fsync()ing if bgrewriteaof is in progress.

Cheers,
Salvatore

Original comment by anti...@gmail.com on 27 Feb 2010 at 11:35

Changed state: Accepted

GoogleCodeExporter commented 9 years ago

updates about this:

1) With the current Linux kernel it is not possible to flush on a different 
thread, as write(2) will block anyway in the main thread. This sucks but I 
don't think this is going to get fixed in little time.
2) We have now in Redis master an option so that fsync(2) is not called when 
there is a background saving/log-rewrite operation in progress. It's a trick... 
but works.
3) All this is highly dependent on the file system used and the mount options.
4) "fsync none" is a trivial way to completely fix this problem but the 
drawback is that up to 30 seconds of logs can be lost.
5) fsync always is now optimized, it is still very very slow but much faster 
than before.

I'm leaving this open as it's an open problem but I don't think there are very 
good way to fix this at the moment, still with the latest changes we mitigated 
the problem enough. Currently when very low latency is required a two box setup 
with a saving slave may be the best option.

Original comment by anti...@gmail.com on 24 Aug 2010 at 2:16

vjzning / redis

Dissapointing responsiveness of Redis when using append only file mode #167