Merkle-CRDTs paper (draft)

hsanjuan commented 5 years ago

Hi all,

In November/December I allocated time to dive into CRDTs because of upcoming features in IPFS Cluster[https://github.com/ipfs/ipfs-cluster]. I read a lot of the literature and this eventually led me to the current and past CRDTs efforts within our ecosystem. Unfortunately I could not find anything officially published or formalized, and I had some some ideas of my own.

The result is a paper on Merkle-CRDTs, that is, CRDTs backed by Merkle-DAGs. The paper formalizes what we called a Merkle-Clock (a logical clock) and goes on to describe Merkle-CRDTs as CRDTs on Merkle-Clocks. It describes implementation approaches, advantages and disadvantages, along with some of the easier optimization methods. It gives due mention to all the ecosystem projects that have worked on this direction and also includes a rather long background section so that interested readers can get up to speed in the notions relevant to digest the rest of the paper if they need to (but otherwise can be skipped).

This paper is meant to give an official, formal description of techniques used by Peerpad (in the past), OrbitDB and now IPFS Cluster. It is also meant to be a base for future -hopefully shorter- papers and research on the topic, for example on optimizations (I've seen we even have Research RFCs open for it).

Samuli (@haadcode) and Pedro (@pgte) were looped in very early in the process for their amazing work on Merkle-CRDTs (OrbitDB, Peerpad/peer-crdt) and provided very valuable feedback, comments and proofreading, thus they appear as additional authors and I'm extremely grateful to them.

We are now, finally, opening up the paper draft to a wider audience (Protocol Labs/IPFS Community). The repository for any contributions is at:

https://github.com/protocol/merkle-crdts-paper/ (if you don't have access, send me your @github handle)

An updated version of the PDF is at: http://hector.link/presentations/merkle-crdts/merkle-crdts.pdf

Feel free to forward this to any interested parties in our community.

Cheers!

gritzko commented 5 years ago

Very interesting. Right now, I am working on blending Causal (CRDT) and Merkle structures (cause they are essentially the same). Some preliminary texts are online at http://replicated.cc I would definitely like to read your paper.

gritzko commented 5 years ago

I think, we may even add RON 2.1 Merkle structure to the paper. It has some special features. RON 2.1 has a formally defined op format, including the so-called "nominal" (internal) fixed-width form. That allows to define a Merkle structure on the op graph itself. Hence, RON2.1 hashes are completely independent of any actual serialization. http://replicated.cc/specs/hash/ http://replicated.cc/specs/nominal/

pgte commented 5 years ago

@vitorenesduarte @CBaquero @gyounes This may interest you.

ianopolous commented 5 years ago

In Peergos we use a merkle CHAMP as a state based CRDT.

vitorenesduarte commented 5 years ago

Thanks @pgte. @hsanjuan Can you add us to the repository?

gritzko commented 5 years ago

I am on page 15 so far. The Merkle clock idea has its logic. Albeit, I would advise against using that term. You'll need a full event database to use those. Sequence numbers and Lamport clocks are directly comparable. Hashes are not. If any timestamp comparison requires multiple round-trips to the database, that's brutal. Hence, this type of clock is hardly distinguishable from the dataset itself. In big-O terms, it is the same thing. If you don't call it "clocks" or "timestamps" then it is perfectly OK.

gritzko commented 5 years ago

One way of mixing Merkle and Lamport identifiers is to de-collision logical time with hash bits. If we define such a "Merkport timestamp": 64 bits for a logical time counter, 64 bits for the truncated data hash. That is not cryptographically strong, of course. It is merely collision/corruption resistant. It is also directly comparable. This scheme has its issues too: hash bits are not compressible and you have to carry them around if you need order.

gritzko commented 5 years ago

It is possible to reduce the total size of the Merkle-CRDT DAG by including only a CID as payload, which can then be fetched separately.

With the op based approach, the op hash is often bigger than the op itself.

hsanjuan commented 5 years ago

@gritzko please let's discuss in issues the actual paper repo. Short answer is Merkle-Clocks are a full event datastore (the DAG). The storage space and comparison trade-off is a well known and highlighted limitation. But it's still a clock because it can provide ordering for events.

@vitorenesduarte done!

hsanjuan commented 5 years ago

@ianopolous where can I read more about it?

gyounes commented 5 years ago

@gritzko please let's discuss in issues the actual paper repo. Short answer is Merkle-Clocks are a full event datastore (the DAG). The storage space and comparison trade-off is a well known and highlighted limitation. But it's still a clock because it can provide ordering for events.

@vitorenesduarte done!

@hsanjuan thanks

duynht commented 5 years ago

@hsanjuan I am starting my research as an undergrad and would like to learn more about your paper. Could you please add me to the repo? Thank you very much

ianopolous commented 5 years ago

@hsanjuan We haven't written up any prose explanations of it, but here's a summary:

Merkle-champs have a property that makes them perfect for building CRDTs with - convergence. Unlike, say, a merkle-btree whose final root hash depends on the order you add the mappings, a CHAMP is insertion order independent. So if different nodes for example are adding different mappings/elements to a champ, then there is a well defined merge operation (commutative, associative and idempotent) on the champs (this assumes the mappings/values in the champ are themselves a CRDT). E.g. it is trivial to make a 2P-Set using a champ.

julianpistorius commented 5 years ago

@hsanjuan could you please add my handle to the repo?

geoah commented 5 years ago

@hsanjuan woa this sounds amazing!! :D would it be possible to either get added to the repo? I'd be really interested in this.

duynht commented 5 years ago

@hsanjuan Too bad I have been kicked out by github. Can you please add me in the repo again? Thank you very much.

doubtingben commented 5 years ago

could you please add my handle to the repo? :pray:

5310 commented 5 years ago

If possible I would like to read the draft as well.

olebedev commented 5 years ago

If possible I would like to read the draft too.

npfoss commented 5 years ago

Could you please add me as well? This sounds super cool

DePizzottri commented 5 years ago

@hsanjuan I'm doing some research on decreasing the dimension of logical clocks regarding CRDTs. It would be very interesting to take a look on your paper.

songjiayang commented 5 years ago

Sounds good, could you please add me? I really want to read and learn more about it, thank you very much.

ldub commented 5 years ago

Could you also add me? Would love to read this paper over the weekend.

mastercyb commented 5 years ago

+1 But I am novice and just started with CRDT

zhujo01 commented 5 years ago

could you please add my handle to the repo? Thanks.

rgrover commented 5 years ago

I'd like access as well, please.

kjzz commented 5 years ago

could you please add my handle to the repo? Thx a lot @hsanjuan

tg-x commented 5 years ago

@hsanjuan I'd like to have a look at the paper repo as well

alinz commented 5 years ago

@hsanjuan could you add me to the paper as well?

atomgardner commented 5 years ago

@hsanjuan could I also have a look at your preprint?

glesserd commented 5 years ago

Me too :) @hsanjuan

jsimnz commented 5 years ago

Would love to get access to the paper. Thanks!

llelf commented 5 years ago

@hsanjuan I’d love to look at it too

axefrog commented 5 years ago

@hsanjuan I'd love access to this too, please!

bonedaddy commented 5 years ago

Would love access as well :D

cyborgshead commented 5 years ago

@hsanjuan please, give me access to the paper. Appreciate

schrepfler commented 5 years ago

Hi @hsanjuan, can you share the access to this paper with me please?

hsanjuan commented 5 years ago

Hi all, after the last batch of revisions, I have put the latest version of the paper draft here: http://hector.link/presentations/merkle-crdts/merkle-crdts.pdf and I will update it regularly.

itsmeadi commented 2 years ago

Hi @hsanjuan, unable to access http://hector.link/presentations/merkle-crdts/merkle-crdts.pdf- timeout could you add me to the paper as well?

hsanjuan commented 2 years ago

hey @itsmeadi , the final paper was uploaded to https://arxiv.org/abs/2004.00107. Working on fixing my site too.

ipfs / notes

Merkle-CRDTs paper (draft) #403