Retried commit after ODistributedRecordLockedException fails and leaks distributed transaction slot

timw commented 2 months ago

OrientDB Version: 3.1.21 (also present in 3.2)

Java Version: 1.8

OS: Linux/Mac

We have experienced production outages where all commit activity in a distributed database stalls, which by analysing heap dumps we have isolated to all of the promisedSequential slots in OTransactionSequenceManager being occupied by transactions that have long since completed.

We've reproduced the effect locally by running two client threads that rapidly (i.e. with no pauses) query and update a shared record, and create a separate record at the same time.

By instrumenting with logging we isolated the slot leak behaviour to when the transaction fails with a OClusterDoesNotExistException: Cluster id '-1' is outside the of range of configured.

When this error occurs, the transaction in the sequence manager is never cleared (either by notifySuccess or notifyFailure).

The root cause of the OClusterDoesNotExistException appears to be:

one of the competing threads times out in ONewDistributedTransactionManager.retriedCommit with a ODistributedRecordLockedException
the recovery for the exception invokes OTransactionRealAbstract.resetAllocatedIds, which resets the RID for the object creation (to -1,-2) and invokes updateIdentityAfterCommit.
on the next retry, the call to ONewDistributedTransactionManager.getInvolvedClusters fails, because one of the Record has a -1 cluster id.

It looks like a call to ODatabaseDocumentAbstract.assignAndCheckCluster is missing during the reset (which is what happens in the original assignment in OTransactionOptimisticServer.assignClusters, and introducing that call does appear to fix the problem.

What I'm not clear on is whether the ODatabaseDocumentAbstract.assignAndCheckCluster call should be added to OTransactionRealAbstract.updateIdentityAfterCommit or just to OTransactionRealAbstract.resetAllocatedIds, and how exactly updatedRids should be corrected when that happens.

timw commented 2 months ago

This appears to do what I want:

--- forkSrcPrefix/core/src/main/java/com/orientechnologies/orient/core/tx/OTransactionRealAbstract.java
+++ forkDstPrefix/core/src/main/java/com/orientechnologies/orient/core/tx/OTransactionRealAbstract.java
@@ -624,10 +624,16 @@ public abstract class OTransactionRealAbstract extends OTransactionAbstract
   public void resetAllocatedIds() {
     for (Map.Entry<ORID, ORecordOperation> op : allEntries.entrySet()) {
       if (op.getValue().type == ORecordOperation.CREATED) {
+        ORID current = op.getValue().getRID().copy();
         ORecordId oldNew =
             new ORecordId(op.getKey().getClusterId(), op.getKey().getClusterPosition());
         updateIdentityAfterCommit(op.getValue().getRID(), oldNew);
-        updatedRids.remove(op.getValue().getRID());
+        // Clear previous update RID chain, which will include mapping created by updateIdentityAfterCommit
+        while (current != null) {
+          current = updatedRids.remove(current);
+        }
+        database.assignAndCheckCluster(op.getValue().getRecord(), null);
+        updatedRids.put(op.getValue().getRID().copy(), oldNew);
       }
     }
   }

I think the same patch would need to be applied to OMicroTransaction?

tglman commented 2 months ago

Hi @timw,

For the stalled transaction sequential it should be there a reset when the transaction fail, that free the sequential position, if you want to track it are the call to context.releasePromises(); that will finally call OTransactionSequenceManager.notifyFailure, will double check that is is missing somewhere .

For the problem with the ids reset I do remember I've fixed a similar problem in 3.2.x series a while ago, but the only commit I can find are https://github.com/orientechnologies/orientdb/commit/4ef8c449627cc92e5afb5a8f546bb90f39348cab and b1375cb81a8a3b0fdc9ff0df700995c63eb90347 that seems to be already part of your changes.

Removing all the mapping in updatedRids it may be a bit dangerous, because in the records there may still be ids pointing to something like 10:-10 that when serialized will not find the mapping anymore to the real id and fail/store a stale pointer.

The process that happen on commit will be transforming an id from 10:-5 or -1:-10 to something like 10:20 ( often the process is multiple steps: -1:-5 -> 10:-5 -> 10:20 and all the steps are in the updatedRids mapping ).

When reverting this id assignment because of concurrent assignation of clusterPostion (the second part of the id so in 10:20 is 20) the correct way would be reset the id to something like 10:-x to trigger the next id association, and make sure that all the ids mapping (updatedRids) now have as end point the resetted id 10:-x, making sure that the old assigned id is now mapping to the new temporary.

So if before resetting in the updatedRids was there:

-1:-10 -> 10:-10 //First temporary transition
10:-10 -> 10:50 //Final Id assignation that failed to allocate

after reset the map should look something like

-1:-10 -> 10:-10 //First temporary transition
10:50 -> 10:-10 //Previous Id assignation now point to the temporary

so need to remove the 10:-10 -> 10:50 and add 10:50 -> 10:-10.

the id reset should not reset the clusterId ( int the 10:20 is the 10 part ) if is doing that probably there is something wrong.

Yes OMicroTransaction should have the exact same logic (in 3.2.x it merged back in the normal transaction)

(P.S. I know that many thing I wrote here you already know, making it clear to everyone )

timw commented 2 months ago

Thanks @tglman - your explanation was very helpful in understanding the intent here (although the updated rids map works in the opposite direction to your examples, tracking back references, so it's a bit confusing).

Having another look at 3.2/develop, it looks like you already fixed this in a667b24, which I should have noticed earlier, so this looks like it's only a 3.1.x bug. That patch appears to work for me.

With the patch applied, the net effect is to set the id to current_cluster:-x, and the updatedRids map is unchanged (since that id is already present in the map, and the mapping from it to the initial id is restored after the call to updateIdentityAfterCommit.

orientechnologies / orientdb