peer-base / js-delta-crdts

Delta State-based CRDTs in Javascript
192 stars 16 forks source link

Sets: support more than primitive types #15

Closed pgte closed 5 years ago

pgte commented 5 years ago

ORSets only support primitive JS types for uniqueness. Devise some custom way to define equality / uniqueness.

/cc @parkan and @satazor

satazor commented 5 years ago

To support non-primitive types we must have a way to identity each item canonically. We can do this implicitly by using tools like https://npms.io/search?q=hash-object and also explicitly by adding the item id as a mutator argument.

Though, this will complicate the Sets CRDTs. For now, we should make it clear in the README that only primitive types are supported (for now).

pgte commented 5 years ago

Internally we use a DotSet, where the key is the object. We can make it so that if the object has an id property, we use that as the instead. Otherwise, the current behaviour happens. @satazor makes sense?

parkan commented 5 years ago

in JVM-land this sort of thing is handled by defining a .hashCode method; this is nice because you can say things like "hash all value fields but not timestamp" -- sounds like we can have a id that's a hash of the object by default but can be overriden?

there's potentially concerns with things like field order so it'd be nice to have this handled robustly

pgte commented 5 years ago

@parkan that's a good analogy. I was going more towards defining distributed object identity, but I guess that what we want is value equivalence. I think that there's an opportunity to do that now in #17.

satazor commented 5 years ago

Regarding using .hashCode: I fear that the convention will force users to duplicate their "id" property that most of them will already have into .hashCode. What if we just create a option to compute the hash code, that by default uses hash-it but the user could customize it to something like: (item) => item.id?

pgte commented 5 years ago

I've been giving this some thought and I see one problem: These objects are going to be distributed and instantiated eventually without control from the user. For instance, a structure may have an ORMap, where one key can map to a AWORSet. When receiving this update, the replica does not know this in advance, and has not opportunity to define this.

If the user could define this customisation, lack of different configuration in replicas will lead to divergence (ouch!).

Thoughts?

satazor commented 5 years ago

For instance, a structure may have an ORMap, where one key can map to a AWORSet. When receiving this update, the replica does not know this in advance, and has not opportunity to define this.

Could you elaborate this point?

If the user could define this customisation, lack of different configuration in replicas will lead to divergence (ouch!).

Indeed, but this is solvable by having the "hash code" be computed at insertion time instead of being calculated by each replica. This is similar to having the .hashCode property but it's hidden from the outside.

Note that still, in both strategies, there's a chance that replicas are running different versions of delta-crdts, which may use different versions of hash-it that could calculate different ids.

pgte commented 5 years ago

@satazor, yes, but it's a convention. The problem I'm describing can be described as something like:

I'm replica A. I define a custom hash function. I add an element to a set. I replicate that element to replica B. When replica B receives this value, it doesn't know which hash function to use, thus losing coherency.

Makes sense?

satazor commented 5 years ago

@pgte I understand, but how is it different from having the hashCode property set manually when a adding items in a replica? The other replicas won’t be able to check if it’s correct because they don’t have the hashing function too, right?

pgte commented 5 years ago

It can differ. One value is explicit, and the other depends on the replica defining a function.

pgte commented 5 years ago

To clarify: It's not a problem of checking, it's a problem of the uniqueness being a function that is defined outside of the CRDT itself.

satazor commented 5 years ago

To clarify: It's not a problem of checking, it's a problem of the uniqueness being a function that is defined outside of the CRDT itself.

Both strategies, hashCode via a property of the object or an option, live outside the scope of the CRDT. That's what I've been trying to say:

const replica = CRDT('my replica');

replica.add({
  id: 'abc', 
  foo: 'bar',
  hashCode: 'abc',
}):

// vs

const replica = CRDT('my replica', {
   hashCode: (item) => item.id,  // Default being hashIt(item) for objects
});

// The `hashCode` function would be used to calculate the hash when adding items.
// The calculated hash-code would still be there on the object,
// but it will be hidden by using a non-enumerable symbol field
replica.add({
  id: 'abc', 
  foo: 'bar',
});

In both scenarios, the hashCode is being set explicitly and both share the same problem. What if the replicas are running different hashing functions?

While they share the same problem, by having it as a option we are givingdelta-crdts the opportunity to calculate the hashCode of items being merged by other replicas. This would allow to emit a warning when they differ, at least.


I've edited the comment to make it clearer.

pgte commented 5 years ago

OK, makes sense. If you want to define object equality, it's hard to do it outside of the scope of the app, but I can live with defining a custom hashing function in the options as you showed.

pgte commented 5 years ago

Fixed by e807a07031a703cb34366dbbdfc687b810261e58 Landed on v0.5.0.

Custom hash function discussed on #20