Our current algorithm has the problem of potential reordering between two steps: system counter increment and pushing to the distributor queue. In the case of interleaving, two updates can be reordered and applied to the user storage in an incorrect order.
This PR implements a newer, improved version of the algorithm. We apply the following changes:
[x] DynamoDB streams queue is now deprecated for the distributor queue.
[x] Committing to system storage happens after successful pushing to the distributor queue.
[x] Incrementing the system counter & pushing to the distributor queue must happen in a single step.
[x] Transactional commit of create/delete.
[x] Committing system storage now adds a pending update.
[x] Distributor receives the lock timestamp.
[x] Distributor reads the node from the system storage and verifies it has been committed.
[x] Distributor verifies the pending update list.
[x] Distributor attempts second write - if the node is not committed and the lock is still held, then write.
[x] If the second write failed, distributor checks if it was committed in the end.
[x] Distributor verifies it's the still original owner to unlock.
[x] Distributor commits by removing the pending update.
Our current algorithm has the problem of potential reordering between two steps: system counter increment and pushing to the distributor queue. In the case of interleaving, two updates can be reordered and applied to the user storage in an incorrect order.
This PR implements a newer, improved version of the algorithm. We apply the following changes: