Open GoogleCodeExporter opened 8 years ago
Looks like the Thread logic that was in MemcachedClient is now part of
MemcachedConnection, and the same flaw exists there.
Original comment by jonat...@gmail.com
on 29 Jul 2011 at 4:48
workaround:
client = new MemcachedClient(cf, AddrUtil.getAddresses(hostList)) {
private volatile boolean running = false;
@Override public void run() {
while (running) {
try {
super.run();
}
catch (Throwable t) {
getLogger().warn(t);
}
}
}
@Override public boolean shutdown(long timeout, TimeUnit unit) {
try {
return super.shutdown(timeout, unit);
}
finally {
running = false;
}
}
};
Original comment by jonat...@gmail.com
on 29 Jul 2011 at 7:10
should be running = true; on line 2 of the workaround :)
Original comment by jonat...@gmail.com
on 29 Jul 2011 at 10:20
I think this may have been addressed in some recent contributed patches.
In particular, these two:
https://github.com/dustin/java-memcached-client/commit/02d2f3e05f7100b09c99f58c8
5cf3d41ee80bbb3
https://github.com/dustin/java-memcached-client/commit/a9035c9e8243aec93d9516c88
f33f6e62d4020c7
... but there's not enough info to say for sure. Note these fixes are in the
just released 2.7.3 and will be part of the upcoming 2.8.
The point about trying to keep going is a good one. Going to mark this
accepted for that reason. May not get to it for a while, but if you'd like to
flesh it out and contribute a patch, we'd be glad to take it.
Original comment by ingen...@gmail.com
on 16 Oct 2011 at 2:11
I met the same problem when OutOfMemoryError was seen in logs. Some events
happened in sequence:
1) Server load became very heavy and memory sharply shrunk.
2) Memcached thread throws OutOfMemoryError and died.
3) More operations are added to input queue but are never consumed.
The risk is catastrophic, isn't it? Note that worker threads are unaware of
memcached dead event. Only one symptom is, application becomes very slow
because get/put are full sooner or later. So could we put this issue into
higher category and fix it in V2.8?
Original comment by smilingai2004@gmail.com
on 8 Dec 2011 at 7:02
Just to provide an update on this (sorry for the long delay), my current
thought is that the best thing to do is to drop all connections, reestablish
and restart the IO thread. The concern is that we don't want to end up
corrupting data, and getting back to a safe state is important.
w.r.t. the comment from smilingai2004, the default behavior with the input
queue is to immediately timeout operations if it's full. This does mean there
can be a lot of memory consumed. This is still bad, of course.
Original comment by ingen...@gmail.com
on 20 Jul 2012 at 7:00
A proposed fix for this is posted here:
http://review.couchbase.org/#change,18693
We'd appreciate any comments on the change set.
Original comment by ingen...@gmail.com
on 30 Jul 2012 at 9:28
I added this fix (http://review.couchbase.org/#change,18693) to the code, and
see the exception in our log (the magic number in response is 83-0x53, instead
of 0x81).
net.spy.memcached.MemcachedConnection Problem handling memcached IO
java.lang.AssertionError: Invalid magic: 83
at net.spy.memcached.protocol.binary.OperationImpl.readFromBuffer(OperationImpl.java:134)
at net.spy.memcached.MemcachedConnection.handleReads(MemcachedConnection.java:458)
at net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:376)
at net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:236)
at net.spy.memcached.MemcachedConnection.run(MemcachedConnection.java:830)
Original comment by sunny201...@gmail.com
on 27 Aug 2012 at 6:33
Invalid magic would be rather bad. That indicates some kind of partial read or
that something went wrong part way through a read. Or something bad on the
server side. What was the server in question?
We'd like to get that fix reviewed in, since it may make us much more reliable.
Original comment by ingen...@gmail.com
on 27 Aug 2012 at 7:07
We are using memcached 1.4.14.
Client application is running on ubuntu.
Original comment by sunny201...@gmail.com
on 27 Aug 2012 at 11:26
Original issue reported on code.google.com by
jonat...@gmail.com
on 29 Jul 2011 at 4:26