Race condition in SerializingTranscoder returns shared object instead of new instance

kaloisi commented 5 years ago

We've finally hunted down an problem that has been haunting us for years. It turns out that the SerializingTranscoder can return the same instance of an object if two threads fetch the same memcache object at the same time.

I've included a junit test that will can reproduce this problem. All the test does is add a simple object to memcache and then fetch it 10k times with 50 threads. It then fails the test if any of the results are the same instance. This should fail 100% of the time.

I think the problem is caused by line https://github.com/killme2008/xmemcached/blob/master/src/main/java/net/rubyeye/xmemcached/transcoders/SerializingTranscoder.java#L80 because is reusing the pervious deserialized value.

I've also included a workaround in the testWithFix() method. I found that by synchronizing on the CachedData and setting d.decodedObject = null will for the SerializingTranscoder to generate a new version of the serialized object.

XmemcachedClientTest.java.zip

killme2008 commented 5 years ago

Xmecmached will merge concurrent get requests if they are trying to retrieve the same key, it's expected.

What's your problem?

kaloisi commented 5 years ago

I agree minimizing the duplicate remote calls and sharing the raw byte arrays are great features but I'm not sure that returning a single/shared instance is expected. It took us years to track this down because the shared instance only happens under high load.

The reason it was so hard to track down was because we use memcache to cache hibernate objects. This means when a request hits our website we first check if the object is our local cache, then we check memcache and finally fetch it from hibernate. When the object is loaded from memcache, we re-attache it to the hibernate session. So the problem wasn't showing up with memcache itself but hibernate was throwing an exception saying the object was already attached to another session. It seems like this is a standard solution and I'd expect other people are having similar problems.

We are currently going to deploy the workaround but would prefer to include/contribute it back to the xmemcache. Could we add a flag to SerializingTranscoder to force new instances per request? This way other people could disable it if needed? Another solution would be to remove "final" from the methods so we could extend it with a custom transcoder.

killme2008 commented 5 years ago

Sorry for that. It look that we missing these optimizations in user guide, i will add some descriptions for them.

But you don't have to modify SerializingTranscoder for this problem, you can disable this optimization by method XmemcachedClient#setOptimizeGet(false), the javadoc as below:

http://fnil.net/docs/xmemcached/net/rubyeye/xmemcached/MemcachedClient.html#setOptimizeGet(boolean)

perfnorm commented 5 years ago

Hi, I work at the same company as @kaloisi, and was one of the devs who banged their heads against this a year back. :)

I do consider this a bug and not an optimization. The reason is that I don't think any developer would have the expectation that if two different Threads in their application asked for the same data from Memcached, they could have a small chance of ending up sharing the same object reference. There aren't any caches that I know about that operate this way because it opens up the opportunity for really challenging to find side-effects (like we encountered).

I think we only even found this because we have such high volume usage (30+ million uniques a month), but I really doubt we are the only ones to have been bitten by it.

At the very least, I would suggest separating out this particular optimization such that it was disabled by default and a developer would have to understand the implications and opt into it explicitly to have it be enabled.

Thanks for your consideration!

killme2008 commented 5 years ago

In fact that spymemcache also have this optimzation. And in cache system, we often use cache item just for readonly useage, but you re-attached it to hiberate here. I don't recieve such issue report before , so i don't think it is a bug but a trap for some situations.

perfnorm commented 5 years ago

Interesting to learn. I suppose one man's trap is another man's bug. :)

killme2008 / xmemcached

Race condition in SerializingTranscoder returns shared object instead of new instance #100