twitter / twemproxy

A fast, light-weight proxy for memcached and redis
Apache License 2.0
12.15k stars 2.06k forks source link

twemproxy get timed out when add more redis nodes #454

Open anhhuyla opened 8 years ago

anhhuyla commented 8 years ago

Hello Pros,

I need to use twemproxy to shard keys on my redis nodes, so first i take some benchmark :

nutcracker01:
        listen: 0.0.0.0:7000
        hash: fnv1a_64
        distribution: ketama
        auto_eject_hosts: true
        server_retry_timeout: 60000
        server_failure_limit: 3
        timeout: 500
        redis: true
        backlog: 1024
        servers:
        - 10.90.7.132:7001:1 server1

with data size at 20 kbyte, ccu = 1000, the result looks very well.

# redis-benchmark -p 7000 -c 1000 -n 10000 -d 20480 -k 1 -r 10000 -t set,get -q -l
SET: 4697.04 requests per second
GET: 4757.37 requests per second

SET: 4750.59 requests per second
GET: 4775.55 requests per second

SET: 4866.18 requests per second
GET: 4175.37 requests per second

Now I add more redis nodes to shard the keys, so the twemproxy config as this :

nutcracker01:
        listen: 0.0.0.0:7000
        hash: fnv1a_64
        distribution: ketama
        auto_eject_hosts: true
        server_retry_timeout: 60000
        server_failure_limit: 3
        timeout: 500
        redis: true
        backlog: 1024
        servers:     
        - 10.90.7.132:7001:1 server1
        - 10.90.7.132:7002:1 server2
        - 10.90.7.132:7003:1 server3
        - 10.90.7.132:7004:1 server4

I re-benchmark again with the same command :

# redis-benchmark -p 7000 -c 1000 -n 10000 -d 20480 -k 1 -r 10000 -t set,get -q -l

Now the twemproxy log get lot of timed out errors :

# tail -f /var/log/nutcracker/nutcracker.log | grep timed
[2016-02-16 22:47:03.096] nc_core.c:237 close s 1009 '10.90.7.132:7001' on event 00FF eof 0 done 0 rb 87666159 sb 93215034: Connection timed out
[2016-02-16 22:47:04.655] nc_core.c:237 close s 1009 '10.90.7.132:7001' on event FFFF eof 0 done 0 rb 700 sb 3204424: Connection timed out
[2016-02-16 22:47:05.218] nc_core.c:237 close s 1009 '10.90.7.132:7001' on event 00FF eof 0 done 0 rb 290 sb 1255416: Connection timed out
[2016-02-16 22:47:30.733] nc_core.c:237 close s 1009 '10.90.7.132:7001' on event FFFF eof 0 done 0 rb 205052911 sb 204343314: Connection timed out
[2016-02-16 22:47:31.291] nc_core.c:237 close s 1009 '10.90.7.132:7001' on event FFFF eof 0 done 0 rb 2497146 sb 24084: Connection timed out
[2016-02-16 22:47:31.837] nc_core.c:237 close s 1009 '10.90.7.132:7001' on event 00FF eof 0 done 0 rb 2869944 sb 19260: Connection timed out
[2016-02-16 22:47:33.282] nc_core.c:237 close s 1009 '10.90.7.132:7001' on event FFFF eof 0 done 0 rb 390 sb 1736152: Connection timed out
[2016-02-16 22:47:33.857] nc_core.c:237 close s 1009 '10.90.7.132:7001' on event FFFF eof 0 done 0 rb 355 sb 1633344: Connection timed out
[2016-02-16 22:47:35.763] nc_core.c:237 close s 1009 '10.90.7.132:7001' on event FFFF eof 0 done 0 rb 2079415 sb 12237198: Connection timed out
[2016-02-16 22:47:36.356] nc_core.c:237 close s 1009 '10.90.7.132:7001' on event FFFF eof 0 done 0 rb 1355336 sb 21996: Connection timed out
[2016-02-16 22:47:36.939] nc_core.c:237 close s 1009 '10.90.7.132:7001' on event FFFF eof 0 done 0 rb 1258320 sb 23616: Connection timed out
[2016-02-16 22:47:38.804] nc_core.c:237 close s 1009 '10.90.7.132:7001' on event FFFF eof 0 done 0 rb 6495890 sb 2571476: Connection timed out
[2016-02-16 22:47:39.372] nc_core.c:237 close s 1009 '10.90.7.132:7001' on event FFFF eof 0 done 0 rb 390 sb 1798416: Connection timed out
[2016-02-16 22:47:41.275] nc_core.c:237 close s 1009 '10.90.7.132:7001' on event FFFF eof 0 done 0 rb 2472126 sb 12587364: Connection timed out
[2016-02-16 22:47:41.820] nc_core.c:237 close s 1008 '10.90.7.132:7004' on event FFFF eof 0 done 0 rb 411086668 sb 445947444: Connection timed out
[2016-02-16 22:47:41.822] nc_core.c:237 close s 1009 '10.90.7.132:7001' on event 00FF eof 0 done 0 rb 3171128 sb 21780: Connection timed out

I check the redis slow log but found nothing :

# echo "config get slowlog-log-slower-than" | redis-cli -h 10.90.7.132 -p 7001
1) "slowlog-log-slower-than"
2) "200000"
[root@twemproxy7-176 ~]# echo "slowlog get 100" | redis-cli -h 10.90.7.132 -p 7001
(empty list or set)

# tail /var/log/redis/redis7001.log :
 Accepted 10.90.7.176:52896
- Reading from client: Connection reset by peer
- Accepted 10.90.7.176:52897
- Reading from client: Connection reset by peer
- Accepted 10.90.7.176:52898
- Reading from client: Connection reset by peer

It looks like that the twemproxy get worse when add more redis nodes => it should be good at this because it was born to do this :)

Is it normal or I have missed something ?

Plz show advices, I really need to use twemproxy to scale out my redis cluster and I plan to add dozens of redis nodes behind a twemproxy.

Version info :

# nutcracker -V
This is nutcracker-0.4.1

# redis-server -v
Redis server v=3.0.7 sha=00000000:0 malloc=jemalloc-3.6.0 bits=64 build=653c024c355853e5

More advices are also appreciate.

Thank you & Brs.

fdsmax commented 5 years ago

timeout 500 milliseconds set in your twemproxy config seems to be very low considering the server communication latency and response time in case of high traffic. Have you tried increasing the timeout config? As your question seems posted long ago, could you pl share the solution in case if you had successfully resolved it?

We have setup twemproxy (nutcracker) with redis cache for our Social Network site Thulya recently and came across the same error. We increased the timeout in the twemproxy config to 30 minutes and set the idle timeout in redis.config to 30 seconds so all idle connections will be freedup by redis. For now the errors vanished but not sure if this is the right fix for this issue.

liudongbo123 commented 3 years ago

i want to know how to solve this problem