ianopolous / crdt

A description of a new conflict free replicated data type (CRDT)
6 stars 3 forks source link

Criticisms of design of CRDT #1

Open cmeiklejohn opened 7 years ago

cmeiklejohn commented 7 years ago

I'd refer the author to the design of the AW-Set [1], if the desire is to have a set with space complexity as close to O(n) as possible.

[1] Almedia et al., Delta State Replicated Data Types, https://arxiv.org/abs/1603.01529

ianopolous commented 7 years ago

@cmeiklejohn Thank you for your feedback. I believe that all set CRDTs are O(number of elements) * scaling for a single element? Certainly 2P, PN, OR and LWW are. For this reason I found it clearer to single out where the scaling is different. Otherwise I'd have to say something like

O(number of elements ever added) rather than O(number of elements ever added)*O(add operations on each element since the last remove)

You rightly point out that deleted elements are present forever, but I think this is true of all set CRDTs (I'm guessing it's not true for the one you've suggested)? I've updated the wording to try and make this and your previous comment clearer.

Thanks for the paper suggestion. I'll have a read.

cmeiklejohn commented 7 years ago

You rightly point out that deleted elements are present forever, but I think this is true of all set CRDTs (I'm guessing it's not true for the one you've suggested)? I've updated the wording to try and make this and your previous comment clearer.

No, that's not true. The provided reference has several implementations that do not require storage of the entire removed element, just a fixed overhead vector clock sized on the number of actors [referred to as a causal context.]

cmeiklejohn commented 7 years ago

We have reference implementations available here (of all of the sets in that paper.)

ianopolous commented 7 years ago

Thanks again! Very interesting!

ianopolous commented 7 years ago

@cmeiklejohn I've thought some more about these other potential CRDTs for sets, and being able to use a vector clock to remove the need to store deleted elements is elegant. However, the setting I'm considering involves many ephemeral writers - the set of nodes isn't well defined, they come and go at relatively high frequency (maybe only doing a single write, propagating it, and leaving the network). In this case the size of the vector clock itself becomes an issue, as it is O(writing nodes).

In comparison, the Parity-Set doesn't have any scaling factor dependent on the number of writers.

In the case where the elements themselves are large and a large fraction of the elements are deleted at any given time, the deleted elements can each be replaced by a cryptographic hash, rather than the elements themselves to mitigate the storage cost. This is of course still O(total elements ever present), but has a much smaller scaling factor for deleted elements.

cmeiklejohn commented 7 years ago

@cmeiklejohn I've thought some more about these other potential CRDTs for sets, and being able to use a vector clock to remove the need to store deleted elements is elegant. However, the setting I'm considering involves many ephemeral writers - the set of nodes isn't well defined, they come and go at relatively high frequency (maybe only doing a single write, propagating it, and leaving the network). In this case the size of the vector clock itself becomes an issue, as it is O(writing nodes).

If this is the use case, then I would look at the design of SwiftCloud where replicas are stored on clients, but you can minimize the number of actors by having them coordinate updates with a server that's in a stable set of membership.

In comparison, the Parity-Set doesn't have any scaling factor dependent on the number of writers.

No, the trade-off you're making is keeping deleted elements (or, some tombstone marker of the deleted element) forever. It's a game of whether you think the vector clock overhead is going to be more or less than the element churn -- one pathological client, either using different identifiers because it leaves/joins a lot, or inserting/deleting unique elements overtime will result in the same worst-case space complexity.

In the case where the elements themselves are large and a large fraction of the elements are deleted at any given time, the deleted elements can each be replaced by a cryptographic hash, rather than the elements themselves to mitigate the storage cost. This is of course still O(total elements ever present), but has a much smaller scaling factor for deleted elements.

At a minimum, for this to work, you'd always have to store the hash (because of concurrent operations where the payload is dropped but the element is re-added) and ensure the hash function doesn't have collisions, else you risk a concurrency anomaly. I don't see how this approach is any better than the AW-Set approach where a dot is maintained for each element uniquely.

russelldb commented 7 years ago

I have a question about the semantics of the set. Most of the CRDT set designs I've seen favour "add wins". In your design if site A adds x and site B adds x and site B removes x and then site A and B synchronise A's x is removed by the merge, even though it was added concurrently/independently of B's remove. Is the aim to have a "remove wins" set when a pair of add | remove operations on the same element are concurrent?

ianopolous commented 7 years ago

@cmeiklejohn

If this is the use case, then I would look at the design of SwiftCloud where replicas are stored on clients, but you can minimize the number of actors by having them coordinate updates with a server that's in a stable set of membership.

The use case I'm working on (pure IPFS based) doesn't have any room for central servers or centralized consensus.

In comparison, the Parity-Set doesn't have any scaling factor dependent on the number of writers.

No, the trade-off you're making is keeping deleted elements (or, some tombstone marker of the deleted element) forever. It's a game of whether you think the vector clock overhead is going to be more or less than the element churn -- one pathological client, either using different identifiers because it leaves/joins a lot, or inserting/deleting unique elements overtime will result in the same worst-case space complexity.

My statement is correct. There is no scaling factor dependent on the number of writers. I'm not saying there are no trade offs. There clearly are. O(deleted elements) versus O(number of writers). The right choice will depend on the application.

At a minimum, for this to work, you'd always have to store the hash (because of concurrent operations where the payload is dropped but the element is re-added) and ensure the hash function doesn't have collisions, else you risk a concurrency anomaly.

That's exactly what I'm saying, store the (cryptographic) hash. If sha256 has a collision then we have far bigger problems, and the odds of finding one are comparable to 1 in the number of particles in the universe.

ianopolous commented 7 years ago

@russelldb My mental model is to compare it to the bitcoin blockchain. If the bitcoin blockchain forks, then the system will take the chain with the most updates to "merge" them. In our case, the state with the most operations on a given element wins. This isn't add wins, or remove wins though. You can flip add and remove in your example and it is still valid. Whether or not this makes sense for a given application is another question.

cmeiklejohn commented 7 years ago

The use case I'm working on (pure IPFS based) doesn't have any room for central servers or centralized consensus.

My point here is that it is more than likely, a pure IPFS system will have some centralized nodes that won't go away to serve as points to join the network and ensure durability under high-churn of dynamic clients.

If your network has high churn, where nodes can go away without notice, and you want to guarantee durability of values, you will need some form of agreement, at least done through a k-stability protocol, to ensure a write remains durable under churn. One practical solution to this is to have some centralized/data center nodes to ensure durability.

ianopolous commented 7 years ago

The IPFS network itself has bootstrap nodes yes, but they are only a convenient way to join the DHT, and can't be relied upon to store anything in particular. They are not strictly necessary, other node discovery mechanisms are possible and indeed being considered.

russelldb commented 7 years ago

@ianopolous thanks for the reply. I'm not sure that this thing is really a CRDT at all, but it is certainly an interesting design, thanks for sharing it.

ianopolous commented 7 years ago

@russelldb Correct me if I'm wrong, but I believe it meets the definition of a CRDT - the set of possible states form a semi-lattice and the merge operation is commutative, associative and idempotent.

russelldb commented 7 years ago

@ianopolous maybe so under that definition, but the datatype part of CRDT does suggest a discernible semantic, though. I probably just don't understand the use case.