intel / CacheLib

Pluggable in-process caching engine to build and scale high performance services
https://www.cachelib.org
Apache License 2.0
17 stars 4 forks source link

Fix race in acquire #68

Closed igchor closed 1 year ago

igchor commented 1 year ago

The assumption for moving items was that once item is unmarked no one can add new waiters for that item. However, since incrementing item ref count was not done under the MoveMap lock, there was a race: item could have been unmarked right after incRef returned incFailedMoving.


This change is Reviewable

igchor commented 1 year ago

It would be good to run a few long-running benchmarks to make sure this fixes the issue

byrnedj commented 1 year ago

I can confirm no impact on performance for leader and follower workloads.

vinser52 commented 1 year ago

@guptask could you please check this patch with your DSA branch and confirm that it fixes the hang that you reported recently?

byrnedj commented 1 year ago

@vinser52 I confirmed with @guptask that the patch works on his DSA branch - however we are now hitting a runtime error now with DSA that is unrelated to the race condition. We confirmed that his branch works by just using memcpy in place of DSA calls with this patch.

vinser52 commented 1 year ago

So, in that case, I believe we can merge this PR?