pmem / libpmemobj-cpp

C++ bindings & containers for libpmemobj
https://pmem.io
Other
108 stars 76 forks source link

test: concurrent_map_0_helgrind fails #797

Open igchor opened 4 years ago

igchor commented 4 years ago

ISSUE: concurrent_map_0_helgrind test fail

Environment Information

How often bug is revealed:

often

Actual behavior:

Test fails

Expected behavior:

Test passes

Details

The problem is most probably due to a inconsistent lock ordering. (which is ok in our case but helgrind still complains).

Additional information about Priority and Help Requested:

Are you willing to submit a pull request with a proposed change? (Yes, No)

Requested priority: Low

igchor commented 4 years ago

@vinser52 could you add more info on this issue? I don't remember exactly why the ordering is reported to be wrong. And we also had some ideas to fix that.

vinser52 commented 4 years ago

Helgrind reports violation of the lock order during the insert operation. Let's imagine we have the following skip list:

layer 3:    |----------------->|
layer 2:    |-------------->|->|
layer 1:    |----->|---->|->|->|
layer 0:    dummy->1->2->5->6->null

And we want to insert the new element with key=3 and some random height (e.g. height=3). First, we have to find insert position and remember previous elements on each layer up to the height of new element. In that example, the previous elements from level 0 to level 2 are 2, 1, dummy. The next step, we need to lock previous elements from level 0 to level equals to the height of the new element. So we start acquiring locks from level 0 to level 2: we acquire locks for nodes with keys 2, 1, dummy. Then we allocate a new node with key 3 and acquire the lock to it to prevent concurrent inserts to update the new node before it is fully linked. So, as you can see the lock order is the following: 2, 1, dummy, 3. After insert we will have the following skip list:

layer 3:    |-------------------->|
layer 2:    |----------->|---->|->|
layer 1:    |----->|---->|->|->|->|
layer 0:    dummy->1->2->3->5->6->null

Now let's insert another element with key=4 and height=4. The previous elements from level 0 to level 3 are 3, 3, 3, dummy. So, we will acquire lock in the following order: 3, dummy. As you can see the order is different comparing to the previous insert (2, 1, dummy, 3): now 3 acquired before dummy. We acquire locks from the lower level to the higher level, which means keys are in descending order and only the new node violates this rule because it is acquired after all predecessors are acquired. But this violation happens only one time for each new key. In other words, we brake the order when acquiring lock for the new node, but it is ok because the new node is not visible to other threads because it not linked to the skip list yet.

igchor commented 4 years ago

I think one possible solution would be to somehow disable tracking the locks ordering by helgrind during the insert (and reenable it after - the order will be OK from there on).

Alternatively we could think about reversing the lock ordering (dummy, 1, 2 instead of 2,1,dummy). But this would nee to be investigated. Even if this is correct it could impact performance (more collisions at dummy?).

vinser52 commented 4 years ago

Actually I tried to disable tracking the lock of the new node, but I have not succeeded. Changing the lock order seems correct and should fix the issue, but as you said we need to check performance impact.