Closed igchor closed 1 year ago
It would be good to run a few long-running benchmarks to make sure this fixes the issue
I can confirm no impact on performance for leader and follower workloads.
@guptask could you please check this patch with your DSA branch and confirm that it fixes the hang that you reported recently?
@vinser52 I confirmed with @guptask that the patch works on his DSA branch - however we are now hitting a runtime error now with DSA that is unrelated to the race condition. We confirmed that his branch works by just using memcpy in place of DSA calls with this patch.
So, in that case, I believe we can merge this PR?
The assumption for moving items was that once item is unmarked no one can add new waiters for that item. However, since incrementing item ref count was not done under the MoveMap lock, there was a race: item could have been unmarked right after incRef returned incFailedMoving.
This change is