peer-base / js-delta-crdts

Delta State-based CRDTs in Javascript
192 stars 16 forks source link

causal sequence type #27

Open pgte opened 5 years ago

pgte commented 5 years ago

The only sequence type implemented so far in JS Delta CRDTs is the RGA. RGA has to keep the tombstones of the deleted elements around to be safe. I think it would be very beneficial to have an array-like type that is causal and somehow allows to compress history (much like the ORMap when using the causal context and the dot store). Is there any research done on such a (delta) state-based type?

/cc @gritzko @vitorenesduarte

vitorenesduarte commented 5 years ago

Hi @pgte. I don't know RGA, but the reason it needs to keep tombstones is because it's always possible to have a concurrent operation referencing the element that was deleted? If yes, maybe causal stability information could be used?

pgte commented 5 years ago

@vitorenesduarte yes, you're correct, that's a possibility. OTOH, a nice side-effect of having a type that uses a CC and a DotStore is that you could easily compose (like in the ORMap type)..

vitorenesduarte commented 5 years ago

You mean composing sequences, or using the current implementations of dot stores to implement a sequence type?

pgte commented 5 years ago

The second, with the goals 1) a sequence to be embedded in any causal type and 2) to embed any causal type inside the sequence.

parkan commented 5 years ago

+1 for this as a super useful application primitive (I'm assuming this would be append-only, as well)

from a user perspective, what I often want is relatively weak/local ordering, i.e. updates that do not directly reference each can have an undefined order

vitorenesduarte commented 5 years ago

I believe that's a different data type (let's call it Append-only Log), and if I understand correctly, there's no need to have conflict resolution in this case.

Append-only Log

Consider a grow-only map mapping naturals to a set of operations (or whatever is to be kept in the log). Initially we have an empty log {}. When adding an op to the log, we take the highest natural in the log (let's say m), and create an entry in the log mapping m + 1 to a set with op.

Example

If we have two operations, A1 and A2, by node A, we'll have this sequence of states:

{}
{1: {A1}}
{1: {A1}, 2: {A2}}

Let's say that node B has seen the last state, and adds a new operation B1:

{1: {A1}, 2: {A2}, 3: {B1}}

If concurrently, A also adds operations A3 and A4 (without seeing the update from B), we will have:

{1: {A1}, 2: {A2}, 3: {A3}, 4: {A4}}

When these two states are merged, we get:

{1: {A1}, 2: {A2}, 3: {A3, B1}, 4: {A4}}

This means that A saw the order A1, A2, A3, A4, and when the update from B arrived, the order changed to A1, A2, A3, B1, A4 (if we order operations inside a set by the node identifier, and A < B). Local order is respected, but at any point, new operations may be inserted before your last operation. Would something like this work in the use case you have in mind @parkan?

gritzko commented 5 years ago

An append-only or a sorted container needs no tombstones, indeed. Arbitrary insertions need reliable location ids.

It would be helpful to apply the 5 whys technique here https://en.m.wikipedia.org/wiki/5_Whys What problem are we trying to solve?

pgte commented 5 years ago

I think @parkan is trying to solve a different problem than I am. I was looking for a entirely-mutable array-like type that uses a DotStore so that I can embed it in other causal types..

gritzko commented 5 years ago

I am not fluent in the dot jargon, but this is how I understand it: you need a scheme where data absence is meaningful. Like, a store has its version clock, data is fed in causal order... if something is covered by the clock, but missing from the storage, we conclude it was deleted. That is a neat tidy scheme and it applies to delta-enabled, op-based and ORDTs, all alike (AFAICS).

I mostly focus on other tricks, because data drop has its issues:

From my younger days, I remember one rule of thumb. The Web grows exponentially, so the Web+history is only heavier than the last snapshot by some factor. Hence, pruning the history does not buy you much.

On the other hand, some use cases may be sensitive to that. So, having a forgetful RGA variant is a plus. AFAIK, the only way to do so is to put a limit on the offline time. Like, if you did not sync for a week, your replica is rotten. Then, older tombstones are prunable, if the tree structure stays intact for the remaining nodes.

gritzko commented 5 years ago

P.S.Correction: the original tree structure is only needed for "recent" nodes (those younger than the offline time, e.g. <1 week old). The rest is stable, hence the exact tree structure is less relevant. We may see it as the "ground" new trees grow from :)

CBaquero commented 5 years ago

Replying to @pgte "I was looking for a entirely-mutable array-like type that uses a DotStore so that I can embed it in other causal types.."

Although my design of ORSeq is flawed I was trying to do such a thing and I think its doable. Here is what I recall of some of the ingredients:

A sequence can refer to a causal context to avoid the use of tombstones. Elements are added into a given position and they get 3 things: a insert position identifier (using some good algorithm for that), a new dot (that is also added to the causal context), and (for now) a immutable content (say, a letter).

If that element is removed we can remove everything and only the dot survives in the causal context. The usual rules that deal with causal contexts can be used to propagate that removal, while avoiding tombstones. This sequence can be embedded in map and share a common causal context. But this sequence is a leaf in terms of embedding, since the inserted elements are immutable.

It should be possible to allow for element mutability. Say, an operation could be applied at a given element position in the sequence. Mutable elements should be causal types that also support causal contexts, and will share them. One could build a sequence of add-win sets, for instance.

This also works with deltas.

I had the plan to try to implement this some years ago, but never did it.

Hope this helps.

parkan commented 5 years ago

ahh I see @pgte, taking my requirements elsewhere then :)

pgte commented 5 years ago

@parkan it's fine, I think you're talking more about something like ipfs-log or versidag: https://github.com/ipfs/dynamic-data-and-capabilities/issues/50

pgte commented 5 years ago

@CBaquero Thank you , that looks like it would work for these purposes! I'll try to prototype an immutable version to see where that leads. :)

RoeeShlomo commented 5 years ago

I'm also interested. @pgte have there been any progress on that?