describe replication protocol?

dominictarr commented 8 years ago

I'm trying to understand how your replication protocol works. I looked at the code but there is quite a bit going on. this comment gives me the impression that hyperlog is a scuttlebutt? but other things have given me the impression that hyperlog can fork? is it just forks in a different sense? I also remember you where working on a thing that traces the graph out.

how does it handle different replication cases? how does it behave if an update was concurrent or not?

mafintosh commented 8 years ago

Every user has their own scuttlebutt like log (append-only log).
When a user insert a new node into the graph they also add this node to their own log. If their graph node has a link to another users log they'll record that in an index (lets call this index A).
When replicating you look at the heads of your graph (the nodes that no one points to). You'll send back a scuttlebutt like handshake for each of the logs that head nodes are stored in.
If the remote user doesn't have the head log, or if their latest node is this log is older than the seq returned in the handshake they'll send a request for the diff.
If you send back a node that has a reference to another feed in index A, send back the scuttlebutt handshake for that particular log.

So when a user does hyperlog.add(links, node) hyperlog will lookup the nodes referenced by the links array and see which logs they belong too. If any of those logs are different than our own local log that information is stored in an internal index.

mafintosh commented 8 years ago

In the future I wanna be able have hyperlog accept some sort of append-only log instance in the constructor (such as secure-scuttlebutt) since the replication protocol in theory should be able to replicate any merkle dag on top of a series of append-only logs

dominictarr commented 8 years ago

how does this integrate with signatures? is each log a pubkey and are log items signed?

mafintosh commented 8 years ago

per default logs aren't signed but there is an api for doing that. this is something i wanna change as well though in the future.

mafintosh commented 8 years ago

honestly the best approach would be to always accept some append-only log instance instead of a leveldb so i could move that responsibility to a different module.

paulkernfeld commented 8 years ago

@dominictarr I think you probably already figured this out, but I'll leave a note for anyone else who finds this thread.

With secure-scuttlebutt, each log is identified by a pubkey. When you ask for a feed, you want to retrieve all updates to that log that are signed by the pubkey, potentially even future updates that do not exist at the time that you received the ID/pubkey. In this case, only one user should update a particular stream; otherwise, you might get incompatible updates to a feed.

With hyperlog, there is no need for signing because hyperlog is content-addressable storage. So if I send you a link to a hyperdrive file, I'm sending you a link to a snapshot of a file at a point in time. So, you can efficiently verify the integrity of what you're receiving, but there's no mechanism for receiving future updates to that file. In this case, anyone can update any file at any time, because it's okay for there to be multiple copies of a file.

mafintosh commented 8 years ago

@paulkernfeld an addition to your post: while it is true that hyperlog uses content-addressable storage it also uses append-only logs to index the content-addressable storage for replication purposes

mafintosh / hyperlog

describe replication protocol? #13