vaseems / xmemcached

Automatically exported from code.google.com/p/xmemcached
Apache License 2.0
0 stars 0 forks source link

Timeout errors and monitoring feature requests #233

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1.I have this setup of 30 servers performing an avg of 150k ops per minute on 5 
memcached servers
2.I randomly see about 150 timeouts a day. They happen with all sorts of 
operations, gets, sets, adds, gets, get counter, incr
3.The exception looks like this:
java.util.concurrent.TimeoutException: Timed out(5000) waiting for operation
        at net.rubyeye.xmemcached.XMemcachedClient.latchWait(XMemcachedClient.java:2565)
        at net.rubyeye.xmemcached.XMemcachedClient.fetch0(XMemcachedClient.java:592)
        at net.rubyeye.xmemcached.XMemcachedClient.get0(XMemcachedClient.java:1005)
        at net.rubyeye.xmemcached.XMemcachedClient.get(XMemcachedClient.java:963)
        at net.rubyeye.xmemcached.XMemcachedClient.get(XMemcachedClient.java:974)
        at net.rubyeye.xmemcached.XMemcachedClient.get(XMemcachedClient.java:996)
        at net.rubyeye.xmemcached.Counter.get(Counter.java:75)
        at mycode.MemcachedAdapterXmemcachedImpl.getCounterValue(MemcachedAdapterXmemcachedImpl.java:108)

What is the expected output? What do you see instead?

- There should not be timeout exceptions (hopeful thinking).
- The error message should include the server/port xmemcached causing the 
failure
- It would be very handy to collect and expose through JMX stats on how long 
calls to Memcached take by server and by call type

What version of the product are you using? On what operating system?

xmemcached 1.3.8
Memcached running on Ubuntu 12.04.1 LTS (GNU/Linux 3.2.0-32-generic x86_64)
App servers CentOS release 5.8 and Ubuntu 12.04.1 LTS (GNU/Linux 
3.2.0-32-generic x86_64)

App servers runnning on Ubuntu CentOS equally fail

Please provide any additional information below.

I'm not using async calls

Original issue reported on code.google.com by claudio....@gmail.com on 20 Nov 2012 at 10:12

GoogleCodeExporter commented 9 years ago
1.Timeout cloud not be avoid completely,it can be caused by such as 
network,full gc that you can't control.But you can make your code to retry when 
catch timeout exception.
2.I don't know why you need this info?
3.You can do this by logging,i don't want to supply this feature.

Thanks.

Original comment by killme2...@gmail.com on 21 Nov 2012 at 10:08

GoogleCodeExporter commented 9 years ago
1. Retrying is not a bad idea, with so many operations and just a few failing, 
the performance hit of retrying should be imperceptible. 
2. Having a timeout but not knowing which particular server produced it is a 
problem if I aim to figure out the pattern and eventually a solution. It could 
be an issue with GC but it could also be an issue with the Memcached server and 
unless I know which one is giving me a hard time I can't diagnose the issue.

I think this small feature can be very easily implemented in 
XMemcachedClient#latchWait method line 2561 with something like this:

    private void latchWait(final Command cmd, final long timeout)
            throws InterruptedException, TimeoutException {
        if (!cmd.getLatch().await(timeout, TimeUnit.MILLISECONDS)) {
            cmd.cancel();
            throw new TimeoutException("Timed out(" + timeout
                    + ") waiting for operation while connected to " +
                    connector.findSessionByKey(cmd.getKey()));
        }
    }

Original comment by claudio....@gmail.com on 29 Nov 2012 at 12:28

GoogleCodeExporter commented 9 years ago

Original comment by killme2...@gmail.com on 11 Jan 2013 at 11:51

GoogleCodeExporter commented 9 years ago
xmemcached 1.4.0 is released.

Original comment by killme2...@gmail.com on 20 Feb 2013 at 9:03