Open whyrusleeping opened 8 years ago
cc @jbenet @diasdavid @lgierth @hsanjuan
cc @haadcode
ping @pgte who has implemented Raft in JS and might be interested or have some really good ideas with regards to the interface.
cc @nicola (I have implemented Raft once, studied different bft protocols - @stebalien was my TA then)
@whyrusleeping @diasdavid my 2 cents:
From what I read it's not clear to me how you plan to handle topology changes. For me that's the hardest part of managing a Raft cluster: making sure you use joint-consensus when adding or removing nodes from a cluster to avoid split brains.
My second concern is modelling the known state of each remote node separately, maintaining the current log index of a remote node, making sure we minimize the amount of data that is transmitted.
The third concern is log compaction: we cannot expect the log to grow infinitely. We need to truncate the log. As such, when updating a remote node, we may need to send it a snapshot (the original InstallSnapshot
RPC on the paper), and then be able to resume log replication from that point on.
Not sure if / how these concerns translate to libp2p or ipfs, but these were my interface and implementation concerns while implementing Raft on Node.js.
@pgte good points but I think this is an abstraction over different consensus mechanisms. I'm still trying to get my head around it though.
If I am not mistaken, Raft (for example) would be wrapped as an Actor
, or more precisely, a set of Actors can choose to use Raft as a way to actually implement SetState()
. Log compaction and cluster maintenance are problems for the specific consensus protocol used by the actors.
I find the use of the word Node
confusing. It seems to be an abstraction over a State/LogStore/ LogEntry, but as a word it is easy to confuse it with a peer, remote node etc.
@whyrusleeping am I on the right line or have I misunderstood? I need to give some thought into this, usually takes a few days until my brain clears the fog.
@hsanjuan I experienced the same confusion over the term Node
.
Yeah, i was using Node
to refer to a log entry. I can rename it to Entry
or something more clear. I also refrained from using Node
to describe a cluster member, instead preferring the term Actor
.
Yeah go ahead and rename @whyrusleeping, but perhaps it should be State
since its a parameter for CommitState
and a distributed-log Entry
is just a particular case of State in the end.
Also, note state of the system
vs state of the cluster
, probably should be the same thing.
I'm having a little trouble understanding what's going on here.
First, can a Consensus
change? Is a Consensus
like a paxos Instance
(immutable once decided). Otherwise, if it can change, how is "current" defined? Time tends to be a nebulous concept in distributed systems. Based on the given interface, the only definition of "current" I can think of would be "a state that was valid at some point in the past".
Are your log entries operations (Op
) or states (nodes?). Basically, are you agreeing on states or state changes (operations)?
"Commit" usually means that an operation has gone through the distributed state machine and all parties have already agreed on it. However, based on your interface, it looks like operations are directly submitted to OpLogConsensus.CommitOp
, applied to the current state, and then the peer gets the cluster to agree on the new state. Is that the case?
@Stebalien
First, can a Consensus change?
Probably not, Unless your cluster changes. The Consensus
interface represents the equivalent of a paxos or raft cluster. It represents the whole system. An Actor
is the abstraction over the local machine.
Are your log entries operations (Op) or states (nodes?). Basically, are you agreeing on states or state changes (operations)?
That feels like an implementation detail to me. Given a log entirely of Op
s i can come up with a final state. Alternatively, I could simply set the state at each point in the log. Either way, you would be able to implement the interface. The system would just have different characteristics depending on how you implemented it. There might be an argument to be had here about solidifying the spec with regards to this point though, so let me know what you think.
it looks like operations are directly submitted to OpLogConsensus.CommitOp, applied to the current state, and then the peer gets the cluster to agree on the new state. Is that the case?
I don't specify the actual behaviour beyond the interface here. The implementation may choose to apply the Op to its 'current' state, then try to 'SetState' with the state it created, or it may choose to push the Op out to the network, and return when it receives a new State update that contains its Op. Again, like above, there may be merit to deciding on how that behaviour should happen for this spec. Please let me know what you think.
FYI: I have created the go-libp2p-consensus repository and the proposal lives there now: https://github.com/libp2p/go-libp2p-consensus
The Rollback() documentation acknowledges that there needs to be a discussion about it. I have started this discussion at https://github.com/libp2p/go-libp2p-consensus/issues/1 . Would appreciate your input!
@jbenet And I sat down and did some thinking about how to abstract consensus for use in libp2p and general ipfs applications.
Here are my notes:
We decided to have two layers of the interface. The first layer of consensus is simply to agree on a single value across the cluster of nodes.
The interface for that looks roughly like:
All this does is allow a set of actors to agree that a given node is the 'current state'. This is a very simple interface, but its also a very powerful one. Most applications that use 'consensus' are using exactly and only this.
Beyond that, we define another interface that maintains an Op log:
Note: its not currently clear (to me) whether the
Actor
interface needs to change to support theOp
, or if the Op is applied to the state and thenSetState
is used to set the 'State' to a node that represents the state transition.Having this log based consensus gives us a few really neat features: