mgalushka / spymemcached

Automatically exported from code.google.com/p/spymemcached
0 stars 0 forks source link

On SunOS, with FailureMode Cancel, spymemcached is not reconnecting to a restarted local memcached server. #225

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
On SunOS, with FailureMode Cancel, spymemcached is not reconnecting to a 
restarted local memcached server.

We have a spymemcached client running on a SunOS system, connecting to a 
locally running memcached server and using FailureMode as Cancel. After we 
reboot the server, the client is not able to reconnect to it. We do not see 
this issue with any of the below being true:
a) the server is running on a different machine, or
b) the client is running on RedHat Linux instead of SunOS, or
c) FailureMode is RETRY instead of CANCEL.

Below is the stack-trace we see when we call client.get(key):

_____________________________________________________________________
java.lang.RuntimeException: Exception waiting for value
        at net.spy.memcached.MemcachedClient.get(MemcachedClient.java:1183)
        at net.spy.memcached.MemcachedClient.get(MemcachedClient.java:1200)
........
........
........
Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: 
Cancelled
        at net.spy.memcached.internal.OperationFuture.get(OperationFuture.java:84)
        at net.spy.memcached.internal.GetFuture.get(GetFuture.java:38)
        at net.spy.memcached.MemcachedClient.get(MemcachedClient.java:1178)
........
........
........
Caused by: java.lang.RuntimeException: Cancelled
........
........
........
_____________________________________________________________________

Environment Details:
_____________________________________________________________________
SpyMemCached Version - 2.7.3
MemCached Version - 1.4.7

>>>uname -a
SunOS abcxyz 5.10 Generic_127112-05 i86pc i386 i86pc

>>>java -version
java version "1.6.0_04"
Java(TM) SE Runtime Environment (build 1.6.0_04-b12)
Java HotSpot(TM) Server VM (build 10.0-b19, mixed mode)
_____________________________________________________________________

Original issue reported on code.google.com by emailton...@gmail.com on 16 Dec 2011 at 1:48

GoogleCodeExporter commented 9 years ago
I did some investigation with the spymemcached source, and it revealed that the 
following change can fix this issue:

location : net.spy.memcached.MemcachedConnection/attemptReconnects()

_____________________________________________________________________

Original (2.7.3):

if(ch.connect(qa.getSocketAddress())) {
    getLogger().info("Immediately reconnected to %s", qa);
    assert ch.isConnected();
} else {
    ops=SelectionKey.OP_CONNECT;
}

Changes:

if(ch.connect(qa.getSocketAddress())) {
    connected(qa);
    addedQueue.offer(qa);
    getLogger().info("Immediately reconnected to %s", qa);
    assert ch.isConnected();
} else {
    ops=SelectionKey.OP_CONNECT;
}

_____________________________________________________________________

We are introducing memcached/spymemcached in our product with the next release 
and it would really help if someone from the spymemcached team can quickly 
review the above changes and give comments.

Thanks,
Nitin

Original comment by emailton...@gmail.com on 16 Dec 2011 at 2:01

GoogleCodeExporter commented 9 years ago
Will look into this soon.

Original comment by ingen...@gmail.com on 5 Feb 2012 at 7:48

GoogleCodeExporter commented 9 years ago
This is a high priority issue. Reconnection failure can not be accepted in a 
production environment. In which release are you planning to fix this issue?

Original comment by ilkinulas on 5 Mar 2012 at 2:03

GoogleCodeExporter commented 9 years ago
going to try to get this handled in 2.8.2.  if you have a chance to post a code 
change to our code review server (review.couchbase.org) that'd be much 
appreciated.

Original comment by ingen...@gmail.com on 22 Mar 2012 at 4:40

GoogleCodeExporter commented 9 years ago
Nitin, Can you please post the code?

Original comment by ajayban...@gmail.com on 23 Mar 2012 at 6:21

GoogleCodeExporter commented 9 years ago
Is there any estimations when this could be fixed?

Original comment by jouk...@gmail.com on 7 Aug 2012 at 12:59

GoogleCodeExporter commented 9 years ago
I've added the recommended fix above just now: 
http://review.couchbase.org/#change,19847

Should be in release 2.8.3.  I don't have a specific test for it, but it 
doesn't look harmful.

Original comment by ingen...@gmail.com on 19 Aug 2012 at 9:04

GoogleCodeExporter commented 9 years ago

Original comment by ingen...@gmail.com on 19 Aug 2012 at 9:04