ipfs-inactive / dynamic-data-and-capabilities

[ARCHIVED] Dynamic Data and Capabilities in IPFS Working Group
59 stars 6 forks source link

Question: how are op-based CRDTs being used in IPFS? #28

Closed pgte closed 5 years ago

pgte commented 6 years ago

Asked here.

pgte commented 6 years ago

IPFS does not use CRDTs at all underneath. It's more that some apps that use IPFS all want dynamic collaborative data, and hence the use of CRDTs on top of the IPFS network. IPFS has nice P2P primitives that allow peer discovery, peer routing, content routing, DHT, pubsub and other services. Also, it exposes an immutable DAG API. We're leveraging some of these to create CRDTs on top.

We've been iterating for around 1 year on this. Our first approach to this was to use an available CRDT library named Y.js and create an ipfs transport / discovery plug-in.

Next, we wanted to provide better persistence characteristics, and hence we created two libraries peer-crdt and peer-crdt-ipfs. The first is an abstract library of op-based CRDTs that can be used in any kind of immutable distributed store like IPFS. The other are the IPFS network and storage bindings.

(All these packages are in JS that runs in browsers)

Now we want to make this approach scale better, and we're on the verge of starting to develop a replication protocol that will build a plumtree-like topology at the app level: https://github.com/ipfs-shipyard/peer-crdt-ipfs/issues/5

@gyounes please feel free to post your questions / comments here below if yo have them :)

pgte commented 6 years ago

Ah, one more thing: in peer-crdt we're using the DAG API to represent the operations. Where the dependencies are represented as links between nodes. Also, a merge node is a content-less node with two parent links.

gyounes commented 6 years ago

This ^ is what I wanted to know more about. So you mean the DAG you are using is what represents the partial order log of operations that you wanted to make more compact by adding some garbage collection mechanism? Or is it a different one? It would be great if you can point me to any documentation, video, slides... that explain how it works. If those are not available I can check the code.

pgte commented 6 years ago

@gyounes unfortunately there is no reference to the internals of these specific DAG structures. Here is my current brain dump on this:

Speaking generally how the DAG API works in IPFS, here is the API: https://github.com/ipfs/interface-ipfs-core/blob/master/SPEC/DAG.md

In peer-crdt, the The IPFS DAG store is being used here:

Operation removal

One obvious problem here we'd have to solve first is having a primitive to discard nodes locally. Eventually the old nodes disappear from the IPFS network as all the nodes will eventually discard old operations. When discarding DAG nodes we either have to:

Faster bootstrapping

The main goal of being able to truncate operations here would be to the bootstrapping of new nodes.

Another problem to solve is, when bootstrapping a new node from a remote replica, to tell this node where to stop resolving (so far it stops resolving) when the parents nodes are local or there are no parent nodes.

Protocol

Now that we know that we can solve this fundamental problem (thanks to you!), we now need to decide how to use it.

One of the strategies we're considering is to ditch the IPFS DAG sync (because currently it's slow), replacing it with a vector-clock based protocol (which could be backed by a DAG, but would not use the native IPFS DAG sync). There is a (very recent) RFC for this protocol: https://github.com/ipfs-shipyard/peer-crdt-ipfs/issues/5

Does this answer your question?

amark commented 6 years ago

You guys should consider seriously integrating with GUN, we've solved most of the CRDT & cryptography questions in our research and implementation. I've spoken to Juan over the years (even warned him about the problems that would happen if everything was based on hashes alone, back in 2014 at #hashtheplanet squat conference), and David reached out to me more recently, Mikeal of course (from NodeJS in general), and recently been bumping into Kyle here in the Bay Area (he lives like literally 9 minutes away). We'd love to officially partner. Let us know!

pgte commented 5 years ago

Right now we're using ∂-CRDTs in peer-star.

amark commented 5 years ago

Of course 6 months later, conveniently you guys "now" have a CRDT, instead of having just integrated with GUN and becoming ahead of curve. Pretty lousy team culture.

Good luck trying to figure out one of the hidden tricky issues with ∂-CRDTs that will take you about ~2 years to find.

pgte commented 5 years ago

@amark I tried, but I never got a response from you in this thread: https://stackoverflow.com/questions/52094556/how-can-i-implement-a-lossless-crdt-using-gun

It would be amazing that you got to actually show me a lossless mutable sequence-like CRDT where we're able to safely garbage-collect and snapshot-sync built on top of Gun!

Good luck trying to figure out one of the hidden tricky issues with ∂-CRDTs that will take you about ~2 years to find.

From all the contacts I've had with academics researching this field I've never come under the impression that ∂-CRDTs have "issues", but I'm open to hear your thoughts about it.

amark commented 5 years ago

@pgte I pretty thoroughly provided answers and links, but shy of implementing it for you I'm not sure what further I could do.

I'm confused, what about using straight-up GUN (which is already compact, no need to garbage collect - just syncs snapshots, like you say)? You don't need to build another CRDT on top.

Take notabug.io (P2P Reddit) for example, it is has already handled a TB of daily traffic + 3K object-inserts/second (twitter does 6K tweets/second) on only $99 worth of hardware decentralized. How would you build a realtime updating Reddit with IPFS? I'm pretty sure you'd need to build something like GUN.

So rather than re-building yet another attempt, you guys could just officially support GUN (here is an example storage adapter in 20 LOC, as an async in-memory but should be easy to swap out with IPFS/IPNS calls), I discussed this with you guys several times but have just gotten consistent political responses and then a few months later seeing large large large sums of money being offered as rewards (when you could just use us for free) for solving work identical to where GUN was at 3 years before. So I just gave up, the team seems to care more about branding it as their own than anything else - that isn't necessarily a bad thing, just trying to call it out, cause I know others who feel the same way/frustrated.

pgte commented 5 years ago

Gun is a graph database, so it’s more ecquivalent to IPLD, although it’s mutable and not content-content addressable, and that’s not what I’m after. If you’re interested in showing me the code of a lossless mutable sequence-like CRDT, I’d be glad to investigate this further.