shixin42 / spymemcached

Automatically exported from code.google.com/p/spymemcached
0 stars 0 forks source link

Performance issue in operation redistribution with big async get bulk requests #186

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
What version of the product are you using? On what operating system?
2.6.0

Tell me more...

There is a serious performance bottleneck inside of spymemcached when it does 
operation redistribution (e.g it can happen on memcached connection 
lost/reconnect).

When spymemcached lost connection, it tries to do a reconnect and redistribute 
remaining operation in the current queue to another memcached server if 
FailureMode=Redestribute (that's a default, other options - Retry, Cancel). The 
problem is that it has 3 nested for-loops inside of redistibuteOperations() 
method:

private void redistributeOperations(Collection<Operation> ops) {
for(Operation op : ops) {
if(op instanceof KeyedOperation) {
KeyedOperation ko = (KeyedOperation)op;
int added = 0;
for(String k : ko.getKeys()) {
for(Operation newop : opFact.clone(ko)) { addOperation(k, newop); added++; }
}
assert added > 0
: "Didn't add any new operations when redistributing";
} else { // Cancel things that don't have definite targets. op.cancel(); }
}
}

For asyncGetBulk requests with 10 operations, each with 1000 keys we are 
getting - 10 * 1000* 1000 = 10 000 000 addOperation() executions that leads to 
inputQueue overflow that expects to get only 16834 by default. So the 
complexity increasing near n*3 and I guess this is what  was the root cause of 
different "Timed out waiting to add" and "Queue full" bug reports.

I'm not sure if spymemcached needs 2 second loop as it leads to a lot of 
duplicates.

I've attached a sample test to simulate this error.

Original issue reported on code.google.com by dm.naume...@gmail.com on 15 Jul 2011 at 1:01

Attachments:

GoogleCodeExporter commented 8 years ago

Original comment by mikewie...@gmail.com on 8 Aug 2011 at 11:50

GoogleCodeExporter commented 8 years ago

Original comment by mikewie...@gmail.com on 24 Aug 2011 at 12:12

GoogleCodeExporter commented 8 years ago

Original comment by mikewie...@gmail.com on 7 Oct 2011 at 8:32

GoogleCodeExporter commented 8 years ago
We backed out this fix since it was causing some other issues. Therefore, I am 
re-opening this issue.

Original comment by mikewie...@gmail.com on 12 Oct 2011 at 7:32

GoogleCodeExporter commented 8 years ago

Original comment by ingen...@gmail.com on 14 Oct 2011 at 8:14

GoogleCodeExporter commented 8 years ago
Moving to 2.7.4 as 2.7.3 was a quick fix for a small issue

Original comment by ingen...@gmail.com on 15 Oct 2011 at 3:08

GoogleCodeExporter commented 8 years ago

Original comment by ingen...@gmail.com on 19 Aug 2012 at 8:11

GoogleCodeExporter commented 8 years ago
Is this issue resolved in release 2.8.4?

Original comment by burningi...@gmail.com on 6 Oct 2012 at 2:47

GoogleCodeExporter commented 8 years ago
burning: No, the nested loops still exist in the 2.8.4 source code. 

Original comment by nat...@neocortical.net on 4 May 2013 at 6:51

GoogleCodeExporter commented 8 years ago
sorry for the bad milestone marker

Original comment by ingen...@gmail.com on 4 May 2013 at 6:53

GoogleCodeExporter commented 8 years ago

Original comment by ingen...@gmail.com on 4 May 2013 at 6:53

GoogleCodeExporter commented 8 years ago
Michael: I think this one has been addressed?

Original comment by ingen...@gmail.com on 2 Jul 2014 at 10:27

GoogleCodeExporter commented 8 years ago
yes, this has changed in recent 2.11 versions.

Original comment by michael....@gmail.com on 3 Jul 2014 at 4:25