calls to delete keys not always consistent across more than one node

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?
1. Run 2+ memcached nodes, configure xmemcached to use 
KetamaMemcachedSessionLocator
2. Add a key:value through MemcachedClient
3. Delete that key
4. Get the key

What is the expected output? What do you see instead?
Expected: a cache miss
Observed: sometimes a cache miss, sometimes the old value for the given key

What version of the product are you using? On what operating system?
xmemcached 1.3.8
Ubuntu 12.04 (I think - will verify this and reply)

Please provide any additional information below.
We are finding that, sometimes, we can still get the value at a deleted key, 
that it hasn't been deleted as we expect. Later, after the TTL expires the key 
naturally, we find that it has been removed as expected.

We also can't reliably reproduce the behavior, which makes this frustrating 
both to try to report (sorry!) and to handle in our codebase. We did find that 
lowering our number of nodes down to 1 causes the problem to disappear.

We only began to notice the problem after upgrading from 1.3.5 to 1.3.8. We 
have checked just about everything that could be wrong in our codebase, and 
haven't found anything. Our main suspicion is that Ketama isn't hashing 
consistently, but again, we don't have a good way to reliably test this.

Thanks!

Original issue reported on code.google.com by andrew@stackmob.com on 10 Nov 2012 at 1:58

GoogleCodeExporter commented 9 years ago

I can't believe this could happen...

Is the delete method returns true? And between the delete and get,has the 
connection been disconnected accidentally?

Thanks.

Original comment by killme2...@gmail.com on 10 Nov 2012 at 1:32

GoogleCodeExporter commented 9 years ago

Hi -

I work with Andrew. The problem isn't actually that the delete fails. The 
problem seems to be that we are experiencing some inconsistencies with Ketama 
(after moving from 1.3.5 to 1.3.8) where it is hashing to different nodes which 
results in data being on both of our memcached nodes such that when we do a 
delete (which succeeds) it is still left in the other node. Then when we do 
subsequent gets we can either get new or stale data based on the fact that it 
seems like we're not getting consistent hashes. The key thing here is we only 
experienced this issue after moving from 1.3.5 to 1.3.8 and 1.3.5 works fine 
currently. Is there anything that changed in between versions that could be 
resulting in differences in the way ketama hashes such that there is some 
scenario where it's not necessarily consistent?

- Taylor

Original comment by tlees...@gmail.com on 10 Nov 2012 at 6:30

GoogleCodeExporter commented 9 years ago

For what it's worth, we can't quite believe that it's happening, either. :)

Original comment by andrew@stackmob.com on 10 Nov 2012 at 10:14

GoogleCodeExporter commented 9 years ago

So, after downgrading back to 1.3.5 and keeping Ketama, we've seen the issue 
come up a couple more times. More to the theory that there might be issues in 
the Ketama implementation. We're going to upgrade to 1.3.8 and switch to the 
ArrayBasedSessionLocator, and see how that fares. Will return with updates.

Andrew

Original comment by devm...@gmail.com on 15 Nov 2012 at 12:25

GoogleCodeExporter commented 9 years ago

Hi,
You said that you notice the problem after upgrading from 1.3.5 to 1.3.8.But 
when you downgraded the version to 1.3.5,it seems the problem was still there.

I think it may be not the issue of xmemcached,because i've written a test case 
for this issue,but i can't re-produce as you said.

Original comment by killme2...@gmail.com on 15 Nov 2012 at 2:58

GoogleCodeExporter commented 9 years ago

any progress?

Original comment by xzhu...@avos.com on 19 Nov 2012 at 7:04

GoogleCodeExporter commented 9 years ago

We had some issues on our end that kept us from rolling out the 1.3.8 + 
ArrayBased change that I mentioned. Those should be going out very soon. Sorry 
for the delay.

Original comment by devm...@gmail.com on 20 Nov 2012 at 10:24

GoogleCodeExporter commented 9 years ago

It seems because of the DNS resolve.
Before the remote address of memcached was resolved, we use the raw ip address 
to build session map in KetamaMemcachedSessionLocator.But after it was 
resolved, we would build the session map with their hostname, so the session 
map was not consistent.

You can set your memcached's hostname and add them to  /etc/hosts in clients 
machines to avoid DNS resolve.

1.4.2 was released,it will try to keep the session map in a consistent way by 
using the socket string at first time we got.

Original comment by killme2...@gmail.com on 19 Jul 2013 at 6:48

Changed state: Fixed

nfhu / xmemcached

calls to delete keys not always consistent across more than one node #228