roc230 / spymemcached

Automatically exported from code.google.com/p/spymemcached
0 stars 0 forks source link

Bug in KetamaNodeLocator #143

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Some cache request which be sent by spymemcached client (use KetamaNodeLocator) 
will be lost under mutiple memcached servers environment when I shut down one 
of tem.

my spymemcached version is 2.5, and use memcached-win32-1.4.4-14 as server.

my test step as follow:

1. start up 2 memcached servers (localhost:11211 and localhost:11212).
2. create 3 spymemcached client with following parameters:
    FailureMode= FailureMode.Redistribute
    LocatorType=Locator.CONSISTENT
    HashAlg=HashAlgorithm.KETAMA_HASH

   clientCluster connect to localhost:11211 and localhost:11212
   client1 connect to localhost:11211
   client2 connect to localhost:11212

3. send 20 cache requests through clientCluster.
4. use client1 and client2 to get and count the cache on localhost:11211 and 
localhost:11212.
5. check if the totals cache count equals to 20.

This test is success when the 2 memcached servers both alive. But when I 
shutdown one of them, the test case will failure because some of the requests 
were miss, for examples, only 15 requests were sent to another alive server,the 
other 5 requests were lost.

The problem maybe exist in getNodeForKey method of KetamaNodeLocator.java,this 
method will return a ketamaNode without check whether this node is alive, so 
when a node is shutdown or crash, the memcached client will has chance to 
distribute requests to a shutdown-server node. 

I try to fix this bug with following codes, and the above test pass sucessfully.

MemcachedNode getNodeForKey(long hash) {
        MemcachedNode rv=null;
        if(!ketamaNodes.containsKey(hash)) {
            // Java 1.6 adds a ceilingKey method, but I'm still stuck in 1.5
            // in a lot of places, so I'm doing this myself.
            SortedMap<Long, MemcachedNode> tailMap=ketamaNodes.tailMap(hash);
            if(tailMap.isEmpty()) {
                rv=ketamaNodes.get(ketamaNodes.firstKey());
                if (!rv.isActive()) {
                    for(MemcachedNode node : ketamaNodes.values()) {
                        if (node.isActive()) {
                            rv = node;
                            break;
                        }
                    }
                }
            } else {
                rv=ketamaNodes.get(tailMap.firstKey());
                if (!rv.isActive()) {
                    for(MemcachedNode node : tailMap.values()) {
                        if (node.isActive()) {
                            rv = node;
                            break;
                        }
                    }
                }
            }
        }
        return rv;
    }

Original issue reported on code.google.com by anyh...@gmail.com on 23 Jun 2010 at 6:38

Attachments:

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
I think that ketemaNodeLocator was designed to return the list including the 
dead ones - they're checked further down the line.

The problem is that the iterator that it returns is of size allNodes.size().

If you have 2 nodes in your ring, and you return 2 random ones. There's a 1 in 
4 chance that you will return 2 dead ones. That will explain why 5 of your 20 
requests failed.

In the original implementation, the number of nodes iterated through was the 
number of buckets.

So if we changed:
    public Iterator<MemcachedNode> getSequence(String k) {
        return new KetamaIterator(k, allNodes.size());
    }

to 
    public Iterator<MemcachedNode> getSequence(String k) {
        return new KetamaIterator(k, ketamaNodes.size());
    }

then it would match the original implementation.

Original comment by david.sh...@gmail.com on 22 Oct 2010 at 1:17

GoogleCodeExporter commented 9 years ago
I believe this has been fixed some time ago.  There was a bug in ketama node 
locator where it would use the length of servers, and that was incorrect.

I'll mark this as accepted and we'll look into it further, but I bet it's 
already fixed.

Original comment by ingen...@gmail.com on 19 Aug 2012 at 8:30

GoogleCodeExporter commented 9 years ago
So, I'm curious. Was this fixed or not? We made our own change some time ago, 
and we want to upgrade but don't want to reintroduce the issue again.

Original comment by penguin...@gmail.com on 6 Sep 2012 at 7:00

GoogleCodeExporter commented 9 years ago
I've not rerun the specific test, which is why I'd not closed it.  Can you 
identify the change you've made?  I can eyeball it and let you know.

This is the commit that I believe fixes it:
https://github.com/dustin/java-memcached-client/commit/00daa10f012a7fd1656f41a56
78629fa5978b1f5

Original comment by ingen...@gmail.com on 6 Sep 2012 at 7:19