josephg / diamond-types

The world's fastest CRDT. WIP.
1.56k stars 29 forks source link

Shelf implementation #10

Closed nickdbush closed 2 years ago

nickdbush commented 2 years ago

An implementation of the Shelf structure from dglittle/shelf. This version is incompatible with the version by @dglittle as this version uses a different method for resolving conflicts between shelves with the same version. Rather than stringifying to JSON and then comparing the ordering of the strings, this version hashes the value and compares those hashes. This should produce a more deterministic result across architectures and platforms, as the order fields being hashed is well-defined, rather than reliant on the implementation of the stringifier. It also avoids the allocation of a new string for the stringified result.

I have opted against implementing the Braid API, the "fancy" local_update and remote_update functions, and the get_change and mask functions on the shelf object. If there's a consensus that they should make it into the library I'm happy to add them.

This PR is currently marked as a draft, as it represents the base implementation that uses a fixed T for the content of the shelves (which can either be a value of T, or a map of T). So, it's currently on the user of the library to define what exactly this T is (be it a counter, tuple, or single value). Other types have not yet been implemented because merging rules are not defined in the current shelf algorithm (which only allows for version difference, recursive map merging, or finally hashing as a method for merging shelves).

For more context on this PR, see the comments of issue #2.

josephg commented 2 years ago

Awesome!

Yeah good call leaving out the braid API functions for now. I haven't had a chance to look through the code yet - I'm ramping down work a bit this close to Christmas / new years'. I'll get to it soon! And I hope you don't mind if I make a bunch of changes to the code after I merge it in. Most lines in DT have been written and rewritten many times, and I expect this code will be no different.

As for the ordering, I'm a little concerned about bringing in a hashing function because I want to keep an eye on wasm bundle size. shelf doesn't care about the ordering of objects, because it just deeply merges their contents. What do you think about a simpler rule saying "all strings sort before all ints, which sort before all floats", or something like that? (And strings are sorted based on a simple lexographic order.) That approach also seems easier to port to other languages (compared to porting across a hash function.)

Also if you're interested, I'm hosting a meeting tomorrow (in about 17 hours if my math is right) to talk about the diamond types' file format. Come along - I'd love to have your perspective so we can start thinking about that aspect of shelf too. Date + zoom link here

nickdbush commented 2 years ago

Let me take a crack at that ordering stuff and I'll get something ready for when things wind back up again after the break. And by all means change away – I'm grateful for the chance to see other perspectives on the problem and learn from them!

Will see you tonight at the design review!

josephg commented 2 years ago

Awesome! And, see you then! :D

nickdbush commented 2 years ago

Also, re the hashing: this was included not to order objects, but to deterministically break ties between shelves with the same version when they cannot be recursively merged. An alternative would be to have T: PartialOrd (or maybe even T: PartialOrd + Ord, although this makes using floats in shelves harder), which removes the dependency on fxhash. That's what I've prototyped in the changes today and will neaten after the break.

josephg commented 2 years ago

Hey! Sorry I haven't gotten back to this. I've been essentially on vacation for the past few weeks. I'm at the moment hanging out in person with @dglittle (the author of shelf) and he's delighted by this and keen to take a look.

We'll get there! I haven't forgotten about this!

josephg commented 2 years ago

Thanks for taking a look Greg! I'll merge and start moving things around.

(I want to reuse the code for the oplog - though storing historical changes is optional for shelf. Need to figure out how I want to do that.)

nickdbush commented 2 years ago

That's all really exciting, thank you both @dglittle and @josephg!