Get returns null while the key exist in the server

GoogleCodeExporter commented 9 years ago

What version of the product are you using? On what operating system?
I'm runnning 5 memcached servres on RedHat5 machines, and a spy memcached 
client running on windows.
I'm running a unit test with "get" method and for time to time the return value 
is null

I found in the log files that when I get the null value, the spymemcached is 
connecting to wrong server (to the firth while the value is on the second one)

attach the log files.

here is my code:

@Test
public void getNullTestRealSpy() throws IOException{
net.spy.memcached.MemcachedClient memcachedClient = new 
net.spy.memcached.MemcachedClient(new DefaultConnectionFactory(), 
CachingFactory.getMemcachedServers());

int keyInt = 17272300;
String key = keyInt + "";
Object value = memcachedClient.get(key);
Assert.assertNotNull(value);        
}

* CachingFactory.getMemcachedServers  - returns list of InetAddress

Original issue reported on code.google.com by yehud...@gmail.com on 14 Jul 2011 at 11:51

Attachments:

GoogleCodeExporter commented 9 years ago

I meet the same situation with version 2.5. I store a list of object in 
memcache like this MemcacheClient.set(key, 0, 
Arrays.asList(Object,Object,Object......)), then i just failed to get it out 
sometime with Spymemcache , but when i use telnet, i can get it.

Original comment by matrix3...@gmail.com on 11 Nov 2011 at 6:41

GoogleCodeExporter commented 9 years ago

I also met such issue using version 2.7. my test is retrieving 64 bytes from 2 
memcached instances and the throughput is about 140K read/sec. but there are 
4500 this kinds of errors out of 220000000 reads. using telnet, I could also 
get the items those reported null.

Original comment by mingfan...@gmail.com on 22 Nov 2011 at 3:56

GoogleCodeExporter commented 9 years ago

The two comments seem unrelated to the original bug report.  

@mingfan have you possibly created so much garbage that the GC kicks in and 
then you are getting timeouts?  What does your test?

@matrix3456 when you don't get the values, what do you get back?

Original comment by ingen...@gmail.com on 22 Nov 2011 at 4:28

GoogleCodeExporter commented 9 years ago

Original comment by ingen...@gmail.com on 22 Nov 2011 at 4:28

Changed state: NeedInfo

GoogleCodeExporter commented 9 years ago

The two comments seem unrelated to the original bug report.  
I thought the issue reported is just what I have seen. 
  The issue won't appear when there is only one memcached servers and when throughput (concurrency) is low. I could get some item by telnet in one server but could only get null with the same key using spymemcached in my test. and I also find that, when one item could not get by spymemcached, then the item could not be read by spymemcached all the time in the same test.

@mingfan have you possibly created so much garbage that the GC kicks in and 
then you are getting timeouts?  What does your test?
The error is not timeout error.

My test is simple, just spaw many thtreads e.g. 1400 threads in 10 client ndoes 
to read objects (which are all 64B) from two memcahed servers concurrently.

Original comment by mingfan...@gmail.com on 22 Nov 2011 at 5:13

GoogleCodeExporter commented 9 years ago

[deleted comment]

GoogleCodeExporter commented 9 years ago

[deleted comment]

GoogleCodeExporter commented 9 years ago

[deleted comment]

GoogleCodeExporter commented 9 years ago

All posts are relevant to the OP's issue and in fact I am facing the same 
problem too.

With just 100 keys and 2 node servers running, I am getting about 48% of the 
keys missing on average.

It's not a time out error, the get actually connects to the wrong server which 
instantly returns an "END", which thus turns up as a null in the code.

Concurrency might not even be the problem, because I am doing a 1sec sleep 
between every get...I have a feeling the hashing is messing up somewhere?

Original comment by Srivath...@gmail.com on 16 Apr 2013 at 2:58

GoogleCodeExporter commented 9 years ago

correction: a Thread sleep on the setting made the difference - so it does seem 
to be a concurrency issue.

Original comment by Srivath...@gmail.com on 16 Apr 2013 at 3:12

GoogleCodeExporter commented 9 years ago

Can you generate some logs at the DEBUG level?

My suspicion is that the connection is occasionally getting interrupted, and as 
a result it's either trying to redistribute and failing to find another server, 
or it's one of two older issues I remember.  One was an issue with the node 
locator where the / in the hostname/ipaddr was causing a problem in the 
hashing.  That was fixed back in 2.7 or so.  The other issue was that the 
redistribute when using ketama had an issue.  That was also fixed in the 2.7 
series, before 2.8.

I think to fix this issue, I'll need an example constructor and version.  
Ideally, it'd be reproduced with the latest releases.

It seems what @mingfan.lu and what @Srivaths90 are describing are different 
though, as one is describing a 0.000020454545455% miss ratio, and the other is 
describing a 48% miss ratio.

Can either @mingfan.lu or @Srivaths90 post a succinct test?  Does the test 
originally posted show the same issue in your environment and you can generate 
debug level logs?

Original comment by ingen...@gmail.com on 16 Apr 2013 at 3:24

GoogleCodeExporter commented 9 years ago

Here:

I was just hacking up, but this is what I was using

public class Test {

    public boolean compareStrings() throws IOException, InterruptedException {
        boolean answer = false;
        double hitRate = 0;

        ConnectionFactory connectionFactory = new DefaultConnectionFactory(
                DefaultConnectionFactory.DEFAULT_OP_QUEUE_LEN,
                DefaultConnectionFactory.DEFAULT_READ_BUFFER_SIZE,
                HashAlgorithm.KETAMA_HASH) {
            public NodeLocator createLocator(List<MemcachedNode> list) {
                KetamaNodeLocator locator = new KetamaNodeLocator(list,
                        HashAlgorithm.KETAMA_HASH);
                return locator;
            }
        };

        MemcachedClient memcachedClient = new MemcachedClient(connectionFactory, AddrUtil.getAddresses("ee309lnx1.ecn.purdue.edu:11211 ee309lnx1.ecn.purdue.edu:11212"));

        for (Integer key = 1; key <= 100; key++) {
            Thread.sleep(50);
            memcachedClient.set(key.toString(), 3600, Math.pow(key, 2));
        }

        System.out.println("-----------------------------------------------");
        System.out.println("+++++++++++++++100 values set++++++++++++++++++");
        System.out.println("-----------------------------------------------");

        Map<SocketAddress, Map<String, String>> stats = memcachedClient.getStats();
        System.out.println("Current items in each memcache:");
        for (SocketAddress socketAddress : stats.keySet()) {
            System.out.println(socketAddress.toString() + ": " + stats.get(socketAddress).get("curr_items"));
            answer = true;
        }

        for (Integer key = 1; key <= 100; key++) {
            Double value = (Double) memcachedClient.get(key.toString());
            System.out.println(value);
            if (value != null && value.equals(Math.pow(key, 2))){
                hitRate++;
            }
        }

        System.out.println("-----------------------------------------------\n");

        System.out.println("\nHit Rate: " + hitRate + "%.");

        memcachedClient.flush(100);

        //memcachedClient.shutdown();

        //memcachedClient = new MemcachedClient(connectionFactory, AddrUtil.getAddresses("ee309lnx1.ecn.purdue.edu:11211 ee309lnx1.ecn.purdue.edu:11212 ee309lnx1.ecn.purdue.edu:11213"));

        return answer;
    }
}

Without that Thread.sleep in the set method, i was generating 48% miss. I now 
have to push this 50(ns/ms?) value down to see how it behaves and also increase 
the key size, and server pop size.

Is this a problem with the hash computation time itself?

Original comment by Srivath...@gmail.com on 16 Apr 2013 at 3:37

roc230 / spymemcached

Get returns null while the key exist in the server #185