Open GoogleCodeExporter opened 8 years ago
Redis is using an efficient event loop based on
select/epoll/kevent/... to deal with non blocking
network I/Os.
It can support many concurrent client connections.
However, it does not execute queries in parallel. So at
a given point in time, it only executes at most one query
(except for some specific commands).
In practice, this is not really an issue because most of the
queries are very fast with a complexity in O(1) or O(log n)
A unique Redis instance on a recent Intel CPU is able to
process about 150000 q/s.
Redis never has to wait for the disconnection of a client
before it can accept and process connections for other
clients.
Are your 500 connections attempts simultaneous?
Regards,
Didier.
Original comment by didier...@gmail.com
on 24 Aug 2011 at 3:20
Didier,
Thank you for your answer! It's clear and simple to understand how Redis runs.
It is quite similar with my understanding. So, now I don't understand what
caused the error. I don't think network environment is an issue for me since my
machines are in the same data warehouse and connected to each other within a
fast network.
Yep I have several hundred clients attempting to Redis simultaneously, the
maximum # cannot exceeds 500 however it's not a # I can control. I am using
Jedis as my client. Is the timeout error a Jedis issue?
Regards,
Yin
Original comment by yen...@gmail.com
on 24 Aug 2011 at 3:46
Also, in the case of new connections, typically operating systems will only
allow the caller to have a backlog of pending incoming connections of a
relatively small fixed number. In the case of linux, you can find out that
number by "cat /proc/sys/net/core/somaxconn" on the command line. It is 128 on
my linux machine. Given that you have 500 clients opening and closing
connections, likely at a very high speed, I could see one of a few different
scenarios leading to your issue:
1. you fill up the 128 entry queue because Redis is busy performing an
operation (socket related, data related, etc.)
2. the system has difficulty handling the volume of incoming TCP requests and
is overloaded
3. the system believes it is under attack from a SYN flood and purposefully
delays packets from your requesting machine
4. you are moving enough data through Redis that your network is saturated,
making it impossible to access
5. if you are using a VPS and it doesn't have enough processing power, IO, etc.
allocated to it, the higher level system could be slowing it down enough to
cause stutters enough to cause any/all of the earlier 4 conditions
I would recommend trying to use connection pooling in your mapreduce, and
trying to run with fewer clients on a smaller dataset to try to find where your
bottleneck is. Also, if you are just reading in Redis, and your data is small
enough, you could run some identical copies of Redis on a few different
machines, and have your clients randomly connect to one once you've determined
how many one Redis can reasonably handle.
Original comment by josiah.c...@gmail.com
on 24 Aug 2011 at 3:56
Oh, on a related note, if you have one Redis that you are making all of your
connections to, even assuming fairly reasonable moderate number of requests of
100k (which is probably 2-4 times as much, really, because of the connection
churn), you're still looking at 1 full day to run this computation.
Original comment by josiah.c...@gmail.com
on 24 Aug 2011 at 3:59
Hi,
Thank you for your suggestions. I will try to use connection pool. Actually
this error does not occur if I run the program on a small data set. In my
MapReduce job there is one Jedis instance per map. When I try to run on the
whole large dataset, error usually came out after finishing 1000 map tasks
(there were still much more maps to run for this job) and the error was
repeated very often. It seems not like an issue relative with entry queue. How
do you think of that?
Thank you!
Yin
Original comment by yen...@gmail.com
on 24 Aug 2011 at 5:12
Hi Josiah, can you tell me where is the related note?
Thanks!
Original comment by yen...@gmail.com
on 24 Aug 2011 at 5:14
Yin: Sorry, "on a related note" is slang for "a related topic to what is being
discussed". So, the "related note" is actually what was written in my comment
#4.
In situations where you have a high number of connections being
created/destroyed, it could also be that Jedis isn't disconnecting, isn't
disconnecting fast enough, Redis has hit it's own connection limit (what is
your configuration set to?), etc.
While the large map operation is happening, how much and what kind of processor
is being used on the machine hosting Redis?
Original comment by josiah.c...@gmail.com
on 24 Aug 2011 at 6:52
Hi Josiah,
Got it!
I explicitly set the connection limit for Redis to unlimited - and
redis-bechmark show that Redis can handle more than 600 clients connecting at
the same time under this setting.
For the machine holding Redis instance, it has 8 CPUs with speed 2992 MHz for
each. Hopefully this is helpful.
Thank you very much!
Regards,
Yin
Original comment by yen...@gmail.com
on 25 Aug 2011 at 6:53
What does top report for 'us', 'sy', 'id', 'wa', 'hi', and 'si' while the
timeouts are happening?
I may be mistaken, but I believe that redis-benchmark keeps connections open
after they are created and reuses them, and connects to the 'localhost', which
removes some of the network overhead. Your map operations are coming from
remote machines, correct? How fast is your network? Can you reduce your number
of concurrent mappers?
Original comment by josiah.c...@gmail.com
on 26 Aug 2011 at 6:27
That the error starts occurring after about 1000 tasks is an indicator to me
that you are hitting the file descriptor limit.
Please correct me if I'm wrong in the following assumptions:
- You use one connection per map task.
- You don't close the connection when you are done with the map task.
Most Linux distributions have a default per-process file descriptor limit of
1024. The timeout errors you see can be caused by Jedis not being able to open
a new socket, and throwing this as a timeout. You can execute "ulimit -n" to
find out your fd limit, or add a numeric argument to set it (e.g. "ulimit -n
4096"). Can you check if the error starts happening after ~4000 map tasks after
executing the former command? If so, we have found the cause of this problem.
Instead of opening a connection for every map task, you will be better off
using a connection pool where the map tasks allocate and put back connection
objects. Next to not being subject to file descriptor limits, it will also be
faster because you don't pay for connection setup/teardown for every map task.
Cheers,
Pieter
Original comment by pcnoordh...@gmail.com
on 30 Aug 2011 at 3:24
Original comment by anti...@gmail.com
on 14 Sep 2011 at 3:36
Original issue reported on code.google.com by
yen...@gmail.com
on 24 Aug 2011 at 1:58