gryadka / js

Gryadka is a minimalistic master-master replicated consistent key-value storage based on the CASPaxos protocol
https://arxiv.org/abs/1802.07000
MIT License
328 stars 32 forks source link

Support delete #7

Open maximecaron opened 7 years ago

maximecaron commented 7 years ago

With current algorithm it's impossible to Delete a key without leaving a Tombstone forever in each Redis replica.
From the client point of view, it's only safe to assume a key is deleted if all replica respond that the key is missing.

The only solution I can think of at the moment is to implement some kind of copying garbage collector that periodically move all non-deleted key to a new epoch.

rystsov commented 6 years ago

It'd be very expensive to move all non-deleted keys. It looks like there is a cheaper way:

  1. Imagine that along with accept.lua there is acceptTombstone.lua which works the same way but also puts a key into a local deleting queue.
  2. Once a client wants to remove a value it puts a tombstone but instead of accept.lua it uses acceptTombstone.lua since it works the same way consistency isn't affected
  3. A GC process regularly:
    • peeks a key from the deleting queue
    • executes identity transformation with the write quorum covering whole cluster
    • pops the key from deleting queue and pushes into deleted queue on every acceptor
  4. Another GC process regularly:
    • peeks a key from the deleted queue
    • connects to each acceptor and wipes a corresponding tombstone
    • pops the key from the deleted queue

Once we know that a tombstone was written to all acceptors it's safe to remove it outside the paxos read/write loop since the state (all value are tombstones) becomes isomorphic to the initial state (all values are null).

During a membership change an operator stops both GC processes, does a migration, updates GC's config (addresses of acceptors and proposers) and restarts the processes.

maximecaron commented 6 years ago

in step #3 when you say write quorum do you mean all nodes or just majority ? If it's all nodes then it mean Tombstone can only be cleaned when all node are online.

This still seem like a good practical solution, thanks a lot for the feedback.