Open hsanjuan opened 5 years ago
Very interesting. Right now, I am working on blending Causal (CRDT) and Merkle structures (cause they are essentially the same). Some preliminary texts are online at http://replicated.cc I would definitely like to read your paper.
I think, we may even add RON 2.1 Merkle structure to the paper. It has some special features. RON 2.1 has a formally defined op format, including the so-called "nominal" (internal) fixed-width form. That allows to define a Merkle structure on the op graph itself. Hence, RON2.1 hashes are completely independent of any actual serialization. http://replicated.cc/specs/hash/ http://replicated.cc/specs/nominal/
@vitorenesduarte @CBaquero @gyounes This may interest you.
In Peergos we use a merkle CHAMP as a state based CRDT.
Thanks @pgte. @hsanjuan Can you add us to the repository?
I am on page 15 so far. The Merkle clock idea has its logic. Albeit, I would advise against using that term. You'll need a full event database to use those. Sequence numbers and Lamport clocks are directly comparable. Hashes are not. If any timestamp comparison requires multiple round-trips to the database, that's brutal. Hence, this type of clock is hardly distinguishable from the dataset itself. In big-O terms, it is the same thing. If you don't call it "clocks" or "timestamps" then it is perfectly OK.
One way of mixing Merkle and Lamport identifiers is to de-collision logical time with hash bits. If we define such a "Merkport timestamp": 64 bits for a logical time counter, 64 bits for the truncated data hash. That is not cryptographically strong, of course. It is merely collision/corruption resistant. It is also directly comparable. This scheme has its issues too: hash bits are not compressible and you have to carry them around if you need order.
It is possible to reduce the total size of the Merkle-CRDT DAG by including only a CID as payload, which can then be fetched separately.
With the op based approach, the op hash is often bigger than the op itself.
@gritzko please let's discuss in issues the actual paper repo. Short answer is Merkle-Clocks are a full event datastore (the DAG). The storage space and comparison trade-off is a well known and highlighted limitation. But it's still a clock because it can provide ordering for events.
@vitorenesduarte done!
@ianopolous where can I read more about it?
@gritzko please let's discuss in issues the actual paper repo. Short answer is Merkle-Clocks are a full event datastore (the DAG). The storage space and comparison trade-off is a well known and highlighted limitation. But it's still a clock because it can provide ordering for events.
@vitorenesduarte done!
@hsanjuan thanks
@hsanjuan I am starting my research as an undergrad and would like to learn more about your paper. Could you please add me to the repo? Thank you very much
@hsanjuan We haven't written up any prose explanations of it, but here's a summary:
Merkle-champs have a property that makes them perfect for building CRDTs with - convergence. Unlike, say, a merkle-btree whose final root hash depends on the order you add the mappings, a CHAMP is insertion order independent. So if different nodes for example are adding different mappings/elements to a champ, then there is a well defined merge operation (commutative, associative and idempotent) on the champs (this assumes the mappings/values in the champ are themselves a CRDT). E.g. it is trivial to make a 2P-Set using a champ.
@hsanjuan could you please add my handle to the repo?
@hsanjuan woa this sounds amazing!! :D would it be possible to either get added to the repo? I'd be really interested in this.
@hsanjuan Too bad I have been kicked out by github. Can you please add me in the repo again? Thank you very much.
could you please add my handle to the repo? :pray:
If possible I would like to read the draft as well.
If possible I would like to read the draft too.
Could you please add me as well? This sounds super cool
@hsanjuan I'm doing some research on decreasing the dimension of logical clocks regarding CRDTs. It would be very interesting to take a look on your paper.
Sounds good, could you please add me? I really want to read and learn more about it, thank you very much.
Could you also add me? Would love to read this paper over the weekend.
+1 But I am novice and just started with CRDT
could you please add my handle to the repo? Thanks.
I'd like access as well, please.
could you please add my handle to the repo? Thx a lot @hsanjuan
@hsanjuan I'd like to have a look at the paper repo as well
@hsanjuan could you add me to the paper as well?
@hsanjuan could I also have a look at your preprint?
Me too :) @hsanjuan
Would love to get access to the paper. Thanks!
@hsanjuan I’d love to look at it too
@hsanjuan I'd love access to this too, please!
Would love access as well :D
@hsanjuan please, give me access to the paper. Appreciate
Hi @hsanjuan, can you share the access to this paper with me please?
Hi all, after the last batch of revisions, I have put the latest version of the paper draft here: http://hector.link/presentations/merkle-crdts/merkle-crdts.pdf and I will update it regularly.
Hi @hsanjuan, unable to access http://hector.link/presentations/merkle-crdts/merkle-crdts.pdf- timeout could you add me to the paper as well?
hey @itsmeadi , the final paper was uploaded to https://arxiv.org/abs/2004.00107. Working on fixing my site too.
Hi all,
In November/December I allocated time to dive into CRDTs because of upcoming features in IPFS Cluster[https://github.com/ipfs/ipfs-cluster]. I read a lot of the literature and this eventually led me to the current and past CRDTs efforts within our ecosystem. Unfortunately I could not find anything officially published or formalized, and I had some some ideas of my own.
The result is a paper on Merkle-CRDTs, that is, CRDTs backed by Merkle-DAGs. The paper formalizes what we called a Merkle-Clock (a logical clock) and goes on to describe Merkle-CRDTs as CRDTs on Merkle-Clocks. It describes implementation approaches, advantages and disadvantages, along with some of the easier optimization methods. It gives due mention to all the ecosystem projects that have worked on this direction and also includes a rather long background section so that interested readers can get up to speed in the notions relevant to digest the rest of the paper if they need to (but otherwise can be skipped).
This paper is meant to give an official, formal description of techniques used by Peerpad (in the past), OrbitDB and now IPFS Cluster. It is also meant to be a base for future -hopefully shorter- papers and research on the topic, for example on optimizations (I've seen we even have Research RFCs open for it).
Samuli (@haadcode) and Pedro (@pgte) were looped in very early in the process for their amazing work on Merkle-CRDTs (OrbitDB, Peerpad/peer-crdt) and provided very valuable feedback, comments and proofreading, thus they appear as additional authors and I'm extremely grateful to them.
We are now, finally, opening up the paper draft to a wider audience (Protocol Labs/IPFS Community). The repository for any contributions is at:
https://github.com/protocol/merkle-crdts-paper/ (if you don't have access, send me your @github handle)
An updated version of the PDF is at: http://hector.link/presentations/merkle-crdts/merkle-crdts.pdf
Feel free to forward this to any interested parties in our community.
Cheers!