Open pgte opened 5 years ago
Hi @pgte. I don't know RGA, but the reason it needs to keep tombstones is because it's always possible to have a concurrent operation referencing the element that was deleted? If yes, maybe causal stability information could be used?
@vitorenesduarte yes, you're correct, that's a possibility. OTOH, a nice side-effect of having a type that uses a CC and a DotStore is that you could easily compose (like in the ORMap type)..
You mean composing sequences, or using the current implementations of dot stores to implement a sequence type?
The second, with the goals 1) a sequence to be embedded in any causal type and 2) to embed any causal type inside the sequence.
+1 for this as a super useful application primitive (I'm assuming this would be append-only, as well)
from a user perspective, what I often want is relatively weak/local ordering, i.e. updates that do not directly reference each can have an undefined order
I believe that's a different data type (let's call it Append-only Log), and if I understand correctly, there's no need to have conflict resolution in this case.
Consider a grow-only map mapping naturals to a set of operations (or whatever is to be kept in the log).
Initially we have an empty log {}
.
When adding an op
to the log, we take the highest natural in the log (let's say m
), and create an entry in the log mapping m + 1
to a set with op
.
If we have two operations, A1
and A2
, by node A
, we'll have this sequence of states:
{}
{1: {A1}}
{1: {A1}, 2: {A2}}
Let's say that node B
has seen the last state, and adds a new operation B1
:
{1: {A1}, 2: {A2}, 3: {B1}}
If concurrently, A
also adds operations A3
and A4
(without seeing the update from B
), we will have:
{1: {A1}, 2: {A2}, 3: {A3}, 4: {A4}}
When these two states are merged, we get:
{1: {A1}, 2: {A2}, 3: {A3, B1}, 4: {A4}}
This means that A
saw the order A1, A2, A3, A4
, and when the update from B
arrived, the order changed to A1, A2, A3, B1, A4
(if we order operations inside a set by the node identifier, and A < B
).
Local order is respected, but at any point, new operations may be inserted before your last operation.
Would something like this work in the use case you have in mind @parkan?
An append-only or a sorted container needs no tombstones, indeed. Arbitrary insertions need reliable location ids.
It would be helpful to apply the 5 whys technique here https://en.m.wikipedia.org/wiki/5_Whys What problem are we trying to solve?
I think @parkan is trying to solve a different problem than I am. I was looking for a entirely-mutable array-like type that uses a DotStore so that I can embed it in other causal types..
I am not fluent in the dot jargon, but this is how I understand it: you need a scheme where data absence is meaningful. Like, a store has its version clock, data is fed in causal order... if something is covered by the clock, but missing from the storage, we conclude it was deleted. That is a neat tidy scheme and it applies to delta-enabled, op-based and ORDTs, all alike (AFAICS).
I mostly focus on other tricks, because data drop has its issues:
From my younger days, I remember one rule of thumb. The Web grows exponentially, so the Web+history is only heavier than the last snapshot by some factor. Hence, pruning the history does not buy you much.
On the other hand, some use cases may be sensitive to that. So, having a forgetful RGA variant is a plus. AFAIK, the only way to do so is to put a limit on the offline time. Like, if you did not sync for a week, your replica is rotten. Then, older tombstones are prunable, if the tree structure stays intact for the remaining nodes.
P.S.Correction: the original tree structure is only needed for "recent" nodes (those younger than the offline time, e.g. <1 week old). The rest is stable, hence the exact tree structure is less relevant. We may see it as the "ground" new trees grow from :)
Replying to @pgte "I was looking for a entirely-mutable array-like type that uses a DotStore so that I can embed it in other causal types.."
Although my design of ORSeq is flawed I was trying to do such a thing and I think its doable. Here is what I recall of some of the ingredients:
A sequence can refer to a causal context to avoid the use of tombstones. Elements are added into a given position and they get 3 things: a insert position identifier (using some good algorithm for that), a new dot (that is also added to the causal context), and (for now) a immutable content (say, a letter).
If that element is removed we can remove everything and only the dot survives in the causal context. The usual rules that deal with causal contexts can be used to propagate that removal, while avoiding tombstones. This sequence can be embedded in map and share a common causal context. But this sequence is a leaf in terms of embedding, since the inserted elements are immutable.
It should be possible to allow for element mutability. Say, an operation could be applied at a given element position in the sequence. Mutable elements should be causal types that also support causal contexts, and will share them. One could build a sequence of add-win sets, for instance.
This also works with deltas.
I had the plan to try to implement this some years ago, but never did it.
Hope this helps.
ahh I see @pgte, taking my requirements elsewhere then :)
@parkan it's fine, I think you're talking more about something like ipfs-log or versidag: https://github.com/ipfs/dynamic-data-and-capabilities/issues/50
@CBaquero Thank you , that looks like it would work for these purposes! I'll try to prototype an immutable version to see where that leads. :)
I'm also interested. @pgte have there been any progress on that?
The only sequence type implemented so far in JS Delta CRDTs is the RGA. RGA has to keep the tombstones of the deleted elements around to be safe. I think it would be very beneficial to have an array-like type that is causal and somehow allows to compress history (much like the ORMap when using the causal context and the dot store). Is there any research done on such a (delta) state-based type?
/cc @gritzko @vitorenesduarte