Open archagon opened 6 years ago
@archagon The subject of CRDTs are very interesting and goes into making dynamic data available on top of the IPFS stack. Some of these points that you touch are very relevant to this specific IPFS working group, for which you can see only the issues related to CRDTs.
Wow, that's a great article! You do a great job of explaining CmRDTs at the beginning — I'm going to recommend this to everyone and add it to the README of this repo. Also, thank you for explaining Causal Trees, they look very useful. The visualisations you produced were very enlightening!
ORDTs (and particularly the structured log structure) look like the unifying data structure for operation-based CRDTs, incorporating, as you say, authorship and causality.
Some questions I had for you regarding:
[...] every change to an ORDT is integrated through the same kind of low-level “rebase” inside the container. This means that the distinction between op-based (CmRDT) and state-based (CvRDT) usage becomes effectively moot.
If I understand correctly, you need to keep the entire tree, and hence the delete operations, correct? Would this be a major difference between ORDTs and CvRDTs? (Besides CvRDTs now keeping history and authorship).
This is related to the Garbage Collection / compaction and removing causal redundancy section further down, where you can compact some sections by removing "delete" operations. What happens when there is a remote node that has an operation that has that "delete" operation as an ancestor? (Nevermind answering that, I think you already answered further down the article).
(I suspect that, in order to support offline editing in a general way, Garbage Collection must go through some type of consensus protocol — I think you touch that when you suggest that baselines are put into vote amongst a majority of connected sites)
Anyway, super interesting and useful!
Thank you for the feedback!
If I understand correctly, you need to keep the entire tree, and hence the delete operations, correct? Would this be a major difference between ORDTs and CvRDTs? (Besides CvRDTs now keeping history and authorship).
Not exactly, since the delete operations and their targets can be removed in the garbage collection step. Any causally-dependant operations just have to be fixed in place when that happens.
ORDTs share a lot with existing CRDTs. For example, a CT ORDT and an RGA CRDT work pretty much the same. The mechanics map nearly 1 to 1. The main difference is that RGA doesn't explicitly treat operations as fundamental units of data, nor the data structure as an event log. Everything is muddled together. RGA defines operations very similarly to a CT, as units of atomic data with timestamp+UUID metadata; but as soon as operations are received, their causality metadata is stripped and they're inserted into their final spot in the array. Similarly, delete operations immediately turn their targets into tombstones, which are later "purged" once all sites have received the deletion. In essence, the different segments of the ORDT pipeline are interspersed throughout the RGA algorithm, making it hard to analyze, and to apply any insights to the design of new RDTs.
RGA is a very effective algorithm, but it doesn't make the logical leap of treating its data structure as an event log, even though that's what it is in practice. Traditionally, one would say that an RGA insert is applied and turned into data; but another way to look at it is that the insert operation is placed in the position of its intended output, then stripped of its metadata. ORDTs make this explicit.
In ORDTs, the final positioning of operations, the stripping of metadata, the creation of tombstones, and the purging of stale data — any change to the operations, in fact — is handled explicitly in the garbage collection step (which is type-specific and optional). All operations ahead of that baseline remain completely immutable. This leaves us with a very pure and highly general operational model for convergent data types — just an ordered event log of micro-operations — and makes it possible to design new RDTs with relative ease, leaving the messy design of the garbage collection scheme as a separate task (if needed). It also gives us tons of flexibility with respect to the organization of our operations in memory, and to the distribution of operations across the network.
And yeah, garbage collection may indeed require some ugly consensus — but at least this can be nicely abstracted away from the underlying mechanics of the data structure!
@archagon thanks so much for the explanation, it was very clear to me!
Just one question left, and it's regarding garbage collection (which is an issue I'm very much interested in understanding better):
[…] Similarly, delete operations immediately turn their targets into tombstones, which are later "purged" once all sites have received the deletion.
In my work we're dealing with a p2p loosely-connected network with probably high churn, making it, in my opinion, hard(er) to get consensus over topology. Which, in my view, makes it hard to do garbage collection without it compromising the entrance of fresh replicas. Do you have any thoughts on this?
Sorry for my delay in replying!
If the ORDTs are used in a state-based (CvRDT-like) way, you could allow sites to concurrently set the baseline, but then on merge, simply retain or restore any deleted operations within the baseline whose absence would cause orphans to occur, as an exception to the regular compaction rules. (So, for example, if you’ve already removed part of your local string ORDT on account of a local baseline, but a remote ORDT happens to have some substrings which are causally linked to those deleted operations, then you would simply copy those missing operations back into your local ORDT on merge, and probably set the new baseline to the union of the two baselines.) This would be equivalent to some sort of coordinated state restoration process in a conventional distributed system, but without any actual communication required. The scheme would probably require the least-invasive garbage collection approach, i.e. compacted operations interleaved in the structured log, or stored in order in a secondary array, with only removals and no mutations of operations permitted. It would also require the relationships between operations and their dependencies to be strictly codified in order to make the process deterministic. (If you find yourself with an orphan, you need to be able to pick out the exact operations that need to be restored in order to reinstate it as part of the ORDT.)
Also, garbage collection does not necessarily have to mean that data is lost when an orphan is made on a remote site. In the case of a Causal Tree, you could mandate that when a subtree is orphaned, it gets deterministically grafted to some close relative operation. (For example, the closest ancestor that falls within the baseline.) Yes, this will cause merge weirdness (substrings out of place) and will require manual tweaking by the user; but at least you’ll be able to clean up old operations without ever having to lose data. You could even mark the grafted subtrees/substrings as "needing reconciliation" with an extra flag/operation and then highlight them in your editor.
Caveat: I haven’t thought through the details and consequences of either approach too carefully, so perhaps there are obvious fatal flaws that I can’t see.
@archagon To me the path to compaction is now clearer. Thank you for explaining this!
Hi! Any interest in including my recent article Data Laced with History: Causal Trees & Operational CRDTs in the discussion? My focus is mostly on CvRDTs, so I'm not sure if this is exactly synergetic with the goals of IPFS; but I do have a working (proof-of-concept) implementation for both macOS and iOS, and I intend to keep polishing the code and refining the interface as I build it into my next app.
I think this reframing of CRDTs makes it quite intuitive to devise convergent data structures that are a) flexible and expandable, b) fully-historied with past revision viewing support, c) garbage-collectible in a variety of ways, and d) trivially splittable along any vector clock (for delta updates). Among other things, I can use these "ORDTs" to support real-time collaboration over simple databases and file stores such as CloudKit or Dropbox.