Closed GoogleCodeExporter closed 9 years ago
There is a continuous timeout threshold that should see the node as down and
then try to reestablish the connection. In this case, it will be successful
reestablishing it and start to timeout all over again. The normal case for
data going to nowhere is when a server crashes or unexpectedly loses power.
If you're using redistribute and change your AcceptingServer to only accept the
connection once, do you see the correct behavior? The default is classic
modulus hashing, not Ketama hashing with redistribute. Look at the
ConnectionFactoryBuilder as a way to set different parameters, including the
timeout threshold.
I'm pretty confident in this functionality, as I'd re-tested it again recently.
Original comment by ingen...@gmail.com
on 22 May 2012 at 3:57
Let me try to explain our issue in another way:
Last week, our production system did not work.
We are using four memcached's. The production system did not work for several
hours, until one of the memcached's (idm-sessmw04) was restarted.
We constantly got the following exception:
net.spy.memcached.OperationTimeoutException: Timeout waiting for value
at net.spy.memcached.MemcachedClient.get(MemcachedClient.java:924)
at net.spy.memcached.MemcachedClient.get(MemcachedClient.java:939)
...
Caused by: net.spy.memcached.internal.CheckedOperationTimeoutException: Timed
out waiting for operation - failing node:
idm-sessmw04.mydomain.de/172.123.123.123:11211
at net.spy.memcached.internal.OperationFuture.get(OperationFuture.java:65)
at net.spy.memcached.internal.GetFuture.get(GetFuture.java:37)
at net.spy.memcached.MemcachedClient.get(MemcachedClient.java:917)
...
It was possible to establish a telnet connection to port 11211 of idm-sessmw04
while this problem has shown up.
Therefore I assume that the memcached on idm-sessmw04 did accept connections,
but never returned any response.
The problem was that the failover of spymemcached did not work. The fourth
memcached did not work for hours and spymemcached did never mark that memcached
as unavailable.
I was not able to find out the reason for this strane behavious of memcached,
therefore I wrote the class AcceptingServer which simulated the behaviour of
the memcached as I have noticed it.
The class MemcachedStandaloneTest then demonstrates that the failover does not
work, if the memcached accepts connections, but never returns a
responses/always causes a timeout.
Currently, spymemcached only adds those hosts to the list of unavailble servers
which do not accept connections.
I would suggest that spymemcached also adds those hosts to the list of
unavailable servers that accept connections, but never return a response.
Original comment by uwestah...@gmail.com
on 23 May 2012 at 5:54
Original issue reported on code.google.com by
uwestah...@gmail.com
on 22 May 2012 at 2:49