microsoftarchive / redis

Redis is an in-memory database that persists on disk. The data model is key-value, but many different kind of values are supported: Strings, Lists, Sets, Sorted Sets, Hashes
http://redis.io
Other
20.84k stars 5.38k forks source link

Replication delay too much #460

Open gruan01 opened 8 years ago

gruan01 commented 8 years ago

Hello ,

We are using Redis 2.8.2400 on windows. Host Environment : 256G Memory, 24CPU , RAID10

In the host, we deployed some Redis service from different Port. (No other service , only redis) We use master-slaver to split write and read.

Redis Service "A" have about 10W hash keys in it, use 6.5G Memory. Redis Service "B" have about 500W Keys (not hash) , use 1.95G Memory.

Many days ago, we found that : Slaver of Service "A", Replication with master have many seconds delay (about 1 minute). We try to use Diskless replication, also have many seconds delay.

Service "B" also delay, but will synced very quickly . I don't know why!

Slaver configuration like the following (Have remove commented config)

port xxx
tcp-backlog 511
bind xxx.xxx.xxx.xxx
timeout 0
tcp-keepalive 0
loglevel notice
logfile ""
databases 16
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum yes
dbfilename RedisDB.rdb
dir ./
slaveof xxx.xxx.xxx.xxx xxx
masterauth xxxxxxxx
slave-serve-stale-data yes
slave-read-only yes
repl-diskless-sync yes
repl-timeout 600
repl-disable-tcp-nodelay no
slave-priority 100
requirepass xxxxxx
maxheap 80530636800
heapdir D:\Redis_Hotel\Hotel_R2.8
maxmemory 53687091200
appendonly no
appendfsync everysec
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
aof-load-truncated yes
lua-time-limit 5000
slowlog-max-len 128
latency-monitor-threshold 0
notify-keyspace-events ""
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
list-max-ziplist-entries 512
list-max-ziplist-value 64
set-max-intset-entries 512
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
hll-sparse-max-bytes 3000
activerehashing yes
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit slave 10240mb 5120mb 60
client-output-buffer-limit pubsub 32mb 8mb 60
hz 10
aof-rewrite-incremental-fsync yes

Master's configuration (Have remove commented config)

port XXX
tcp-backlog 511
bind XXX.XXX.XXX.XXX
timeout 50
tcp-keepalive 0
logfile ""
databases 16
save 900 1
save 300 10
save 60 10000
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum yes
dbfilename RedisDB.rdb
dir ./
slave-serve-stale-data yes
slave-read-only yes
repl-diskless-sync yes
repl-diskless-sync-delay 5
repl-timeout 600
repl-disable-tcp-nodelay no
slave-priority 100
requirepass XXXXXXXX
maxheap 80530636800
heapdir D:/Redis_Hotel/Hotel_W2.8
maxmemory 53687091200
appendonly no
appendfilename "appendonly.aof"
appendfsync everysec
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
aof-load-truncated yes
lua-time-limit 5000
slowlog-log-slower-than 10000
slowlog-max-len 128
latency-monitor-threshold 0
notify-keyspace-events ""
list-max-ziplist-entries 512
list-max-ziplist-value 64
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
hll-sparse-max-bytes 3000
activerehashing yes
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit slave 10240mb 5120mb 60
client-output-buffer-limit pubsub 32mb 8mb 60
hz 10
aof-rewrite-incremental-fsync yes

Can somebody tell me what's wrong with the configuration ? Why have so long delay with Replication ??

Thanks very much.

enricogior commented 8 years ago

Hi @gruan01, how did you measure the replication delay?

gruan01 commented 8 years ago

Hi.

When I set a new key in master, until about 1 minute later, I can saw it appear at slave. When I update it's value at master , It's value not change until about 1 minute later in slave. master_repl_offset more than slave_repl_offset very much ! Master's key's count more than slave's key's count about 100.

I don't know what wrong of the configuration. We are trying to re-scale it's store structure (not use hash).

enricogior commented 8 years ago

@gruan01 can you please check the repl-ping-slave-period flag value on the slave? To get the flag value, connect to the slave with redis-cli and run "config get repl-ping-slave-period" the default value is 10 (seconds). If the server is not sending a PING or data for more than 10 seconds, the slave is contacting the server to check why it hasn't received the PING or the data.

gruan01 commented 8 years ago

Hi, thanks. Yes , we use the default value of repl-ping-slave-period, I have checked it , it's value is 10.

enricogior commented 8 years ago

@gruan01 unfortunately Redis doesn't have any debug logging for the replication code that can help for this issue, therefore it's really difficult to understand why the replication on your system takes 1 minute to occur. If the data stored in your Redis instance is not private data and you can share it, I can try to reproduce the issue on my test machine. But if it's private data that you cannot share, the only alternative would be to installed a custom version of Redis with extra debug logging for the replication code.

chester89 commented 8 years ago

@gruan01 are both computers on your local network or somewhere in the cloud? maybe it's network latency that causes the delay, but even so 1 minute is still too much

gruan01 commented 8 years ago

@enricogior Thank you . I have see the RDB file, now it's size more than 3.5G, and it's our product's price data, so I can't share it. Now , we are planing to change store structure, not use hash. Thanks.

@chester89 The master and slaver in same machine, so I don't think is cause of network latency. Thanks.

enricogior commented 8 years ago

@gruan01 ok.