lishunli / spymemcached

Automatically exported from code.google.com/p/spymemcached
0 stars 0 forks source link

Suggestion for OperationTimeoutExceptions #32

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Using version 2.1

Sometimes I run into OperationTimeoutExceptions on gets.
It would be great if it was possible to find out which node was causing the
timeouts (e.g. via e.getMessage()?) directly from the exception (without
running in debug mode).

Original issue reported on code.google.com by Michael....@gmail.com on 26 Jun 2008 at 5:31

GoogleCodeExporter commented 9 years ago
What are you wanting to know, exactly?  Can you give me examples of how you 
might
imagine this working?

Original comment by dsalli...@gmail.com on 27 Jun 2008 at 5:59

GoogleCodeExporter commented 9 years ago
I would find it very useful to see the address of the memcached node that 
caused the
timeout.

e.g. 
try
{
  client.get(key);
}
catch (OperationTimeoutException e)
{
   // getNodeInfo() could be the SocketAddress to identify the node
   log.error("Node that caused timeout: " + e.getNodeInfo());
   // or maybe just add the node address to meesage?
   log.error("Node that caused timeout: " + e.getMessage();
}

Does this make it clearer?

Original comment by Michael....@gmail.com on 27 Jun 2008 at 11:08

GoogleCodeExporter commented 9 years ago
+1 For this - the IP address of the node that the timeout occurred on would be 
super
useful for debugging network-related issues in setups with hundreds of memcache 
machines.

Original comment by massdosage on 30 Jun 2008 at 11:02

GoogleCodeExporter commented 9 years ago
Currently we see this in our logs (using 2.1):

[2008-11-02 13:56:41] ERROR 
net.spy.memcached.OperationTimeoutException: Mutate operation timed out, unable 
to
modify counter [da5a963177c779f86fec2618036175f3]
        at net.spy.memcached.MemcachedClient.mutate(MemcachedClient.java:1050)
        at net.spy.memcached.MemcachedClient.mutateWithDefault(MemcachedClient.java:1089)
        at net.spy.memcached.MemcachedClient.incr(MemcachedClient.java:1126)
        at
fm.last.memcached.spy.MemCachedClientAdaptor.incr(MemCachedClientAdaptor.java:18
6)

We have roughly 100 memcache nodes. From this stack trace it is impossible to 
tell
which of the nodes is causing the problem. It would be great if the exception 
message
said something along the lines of:

Mutate operation timed out on 111.222.333.444

That would help in cases where there is an issue with just one of the nodes as 
only 
its IP address would show up in the logs. If lots of IP's showed up then it 
would
point in the direction of a more general failure. Either way, having that 
additional
information there would help debugging the cause of timeouts.

Original comment by massdosage on 20 Nov 2008 at 5:53

GoogleCodeExporter commented 9 years ago
It's a bit more complicated than this, but you can get the actual node that 
should've
been handling the operation from the root cause exception of the
OperationTimeoutException (as attached to the operation itself).

A stack trace print will show the information as well.

It's not guaranteed to report an actual problem with the node, but it should 
satisfy
this request.

Original comment by dsalli...@gmail.com on 18 Sep 2009 at 6:47