A consensus interface for libp2p

whyrusleeping commented 8 years ago

@jbenet And I sat down and did some thinking about how to abstract consensus for use in libp2p and general ipfs applications.

Here are my notes:

We decided to have two layers of the interface. The first layer of consensus is simply to agree on a single value across the cluster of nodes.

The interface for that looks roughly like:

type Consensus interface {
    // GetCurrentState returns the current agreed upon state of the system
    GetCurrentState() (State, error)

    // CommitState attempts to set the current state of the system. It returns
    // the new state of the system if the call succeeds.
    CommitState(State) (State, error)

    // SetActor sets the underlying actor that will actually be performing the
    // operations of setting and changing the state.
    SetActor(Actor)                                                                                                                                                                                 
}                                                                                                                                                                                                   

type Actor interface {
        // SetState attempts to set the state of the cluster to the state
        // represented by the given Node
        SetState(State) (State, error)
}

All this does is allow a set of actors to agree that a given node is the 'current state'. This is a very simple interface, but its also a very powerful one. Most applications that use 'consensus' are using exactly and only this.

Beyond that, we define another interface that maintains an Op log:

type Op interface {
    ApplyTo(State) (State, error)
}

type OpLogConsensus interface {
    // CommitOp takes the current state and applies the given operation to it, then
    // sets the new state as the clusters 'current' state.
    CommitOp(Op) (State, error)

    SetActor(Actor)

    // GetLogHead returns a node that represents the current state of the cluster
    GetLogHead() (State, error)

    // Rollback rolls the state of the cluster back to the given node. The
    // passed in node should be a node that was previously a state in the chain of
    // states. Although some discussion will need to be had as to whether or not
    // this is truly necessary
    Rollback(State) error
}

Note: its not currently clear (to me) whether the Actor interface needs to change to support the Op, or if the Op is applied to the state and then SetState is used to set the 'State' to a node that represents the state transition.

Having this log based consensus gives us a few really neat features:

The ability to roll back state to some previous state. Since every operation used to generate the existing state of the cluster, we can at any point replay from the beginning the entire log up until the desired rollback point.
Having a log allows us to easily and nicely build more advanced data structures on top of the generalized oplog consensus. For example, orbit-db could be (almost) trivially modeled on top of such an interface.
Somewhat obviously, but still worth mentioning, The availability of the op log allows peers to browse through the history of the clusters state.

whyrusleeping commented 8 years ago

cc @jbenet @diasdavid @lgierth @hsanjuan

whyrusleeping commented 8 years ago

cc @haadcode

daviddias commented 8 years ago

ping @pgte who has implemented Raft in JS and might be interested or have some really good ideas with regards to the interface.

nicola commented 8 years ago

cc @nicola (I have implemented Raft once, studied different bft protocols - @stebalien was my TA then)

pgte commented 8 years ago

@whyrusleeping @diasdavid my 2 cents:

From what I read it's not clear to me how you plan to handle topology changes. For me that's the hardest part of managing a Raft cluster: making sure you use joint-consensus when adding or removing nodes from a cluster to avoid split brains.

My second concern is modelling the known state of each remote node separately, maintaining the current log index of a remote node, making sure we minimize the amount of data that is transmitted.

The third concern is log compaction: we cannot expect the log to grow infinitely. We need to truncate the log. As such, when updating a remote node, we may need to send it a snapshot (the original InstallSnapshot RPC on the paper), and then be able to resume log replication from that point on.

Not sure if / how these concerns translate to libp2p or ipfs, but these were my interface and implementation concerns while implementing Raft on Node.js.

hsanjuan commented 8 years ago

@pgte good points but I think this is an abstraction over different consensus mechanisms. I'm still trying to get my head around it though.

If I am not mistaken, Raft (for example) would be wrapped as an Actor, or more precisely, a set of Actors can choose to use Raft as a way to actually implement SetState(). Log compaction and cluster maintenance are problems for the specific consensus protocol used by the actors.

I find the use of the word Node confusing. It seems to be an abstraction over a State/LogStore/ LogEntry, but as a word it is easy to confuse it with a peer, remote node etc.

@whyrusleeping am I on the right line or have I misunderstood? I need to give some thought into this, usually takes a few days until my brain clears the fog.

pgte commented 8 years ago

@hsanjuan I experienced the same confusion over the term Node.

whyrusleeping commented 8 years ago

Yeah, i was using Node to refer to a log entry. I can rename it to Entry or something more clear. I also refrained from using Node to describe a cluster member, instead preferring the term Actor.

hsanjuan commented 8 years ago

Yeah go ahead and rename @whyrusleeping, but perhaps it should be State since its a parameter for CommitState and a distributed-log Entry is just a particular case of State in the end.

Also, note state of the system vs state of the cluster, probably should be the same thing.

Stebalien commented 8 years ago

I'm having a little trouble understanding what's going on here.

First, can a Consensus change? Is a Consensus like a paxos Instance (immutable once decided). Otherwise, if it can change, how is "current" defined? Time tends to be a nebulous concept in distributed systems. Based on the given interface, the only definition of "current" I can think of would be "a state that was valid at some point in the past".

Are your log entries operations (Op) or states (nodes?). Basically, are you agreeing on states or state changes (operations)?

"Commit" usually means that an operation has gone through the distributed state machine and all parties have already agreed on it. However, based on your interface, it looks like operations are directly submitted to OpLogConsensus.CommitOp, applied to the current state, and then the peer gets the cluster to agree on the new state. Is that the case?

whyrusleeping commented 8 years ago

@Stebalien

First, can a Consensus change?

Probably not, Unless your cluster changes. The Consensus interface represents the equivalent of a paxos or raft cluster. It represents the whole system. An Actor is the abstraction over the local machine.

Are your log entries operations (Op) or states (nodes?). Basically, are you agreeing on states or state changes (operations)?

That feels like an implementation detail to me. Given a log entirely of Ops i can come up with a final state. Alternatively, I could simply set the state at each point in the log. Either way, you would be able to implement the interface. The system would just have different characteristics depending on how you implemented it. There might be an argument to be had here about solidifying the spec with regards to this point though, so let me know what you think.

it looks like operations are directly submitted to OpLogConsensus.CommitOp, applied to the current state, and then the peer gets the cluster to agree on the new state. Is that the case?

I don't specify the actual behaviour beyond the interface here. The implementation may choose to apply the Op to its 'current' state, then try to 'SetState' with the state it created, or it may choose to push the Op out to the network, and return when it receives a new State update that contains its Op. Again, like above, there may be merit to deciding on how that behaviour should happen for this spec. Please let me know what you think.

hsanjuan commented 7 years ago

FYI: I have created the go-libp2p-consensus repository and the proposal lives there now: https://github.com/libp2p/go-libp2p-consensus

hsanjuan commented 7 years ago

The Rollback() documentation acknowledges that there needs to be a discussion about it. I have started this discussion at https://github.com/libp2p/go-libp2p-consensus/issues/1 . Would appreciate your input!

ipfs / notes

A consensus interface for libp2p #179