Implement block and state synchronization protcol

srderson commented 8 years ago

Need to implement the block and state synchronization protocol.

Every peer needs to broadcast each time a new block is added to their chain.

@kchristidis Can we link into consensus here? Does this broadcast already take place? Is there a built in mechanism to detect falling behind in consensus

If it is determined that a peer has fallen behind

Path A) If the peer is at block 0

Broadcast a request for the full state
Download the full state from a responding peer

Path B) Else if the peer is at block 0 + X where X is > 0

Broadcast a request for the state delta of the peer's latest block number + 1
Download the state delta from a responding peer and repeat for each missing block. If no peer has a state delta available, the full state must be requested.

For both paths

Broadcast a request for the latest block and download from f +1 peers
Confirm all state hashes in the blocks match
Hash downloaded state and confirm it matches that in the block
Broadcast a request for all older blocks
Download blocks from a responding peer
Confirm hashes

@binh - let me know if you have any comments. I think we should meet Monday to discuss.

kchristidis commented 8 years ago

Can we link into consensus here? Does this broadcast already take place?

PBFT has this notion of checkpointing every K blocks. You're effectively broadcasting a new checkpoint message right after executing the K transactions that says, "here's my global hash right after I executed transaction number X". If you get 2f+1 of those messages with you have a so-called stable certificate and you know that this checkpoint can be trusted.

So by setting K to 1 we can make this work for every block.

Refer to Section 4.4 (pg. 409-410) of the paper for more info.

Is there a built in mechanism to detect falling behind in consensus?

I find at least two such references in the paper.

One, when the network is switching to a new view (elect a new leader when it's suspected to be faulty). A reconfiguration of sorts take place and the primary (leader) or the backups (the rest of the validating peers) may find out that they're missing certain checkpoints that are needed ("New-View Message Processing", pg. 413-414). In that case, they need to initiate a state transfer (#291).

Two, a replica will generally initiate a state transfer (again, #291) when it learns about a stable checkpoint with sequence number greater than "the high water mark in its log" (i.e. the sequence numbers it is currently processing or set to process -- do note that this is a rough and almost certainly incorrect explanation). See Section 6.2.2 "State Transfer", pg. 427-429 for more. I think this is the closest that you'll get to a "yes" to your question.

I also find a third reference: If they fall behind, they will be unable to execute new requests that are being broadcasted. According to the paper, they will timeout and request for a view change ("Liveness", pg. 415). But this view change request shouldn't always be granted so I'm not sure a state transfer will happen then.

@vukolic is the authoritative voice on this and will correct me if where I'm wrong.

binhn commented 8 years ago

A couple of things that we need to keep in mind to implement this protocol:

1) Sync protocol should be independent of consensus. Specific consensus will add more according to its working assumptions. Example for PBFT is specified in #291

2) NVP sync is different from VP. While VP sync is consensus dependence, NVP sync is more of a direct copy from the VP or NVP that it is connected to because NVP must have (and authorized to) a trusted relationship with the VP on the network

So at the minimum, when a peer (VP or NVP) needs to sync, it will broadcast to all connected peers its current block number, b(i). It then may pick a peer who responded to request for data, which will transfer b(i+1 to j) where j is the current block number. It will also receive the current state (delta or full as @srderson stated in the description).

Each consensus plugin will override the above behavior, which the current message handler framework already provides a method for the plugin to intercept the message and process accordingly.

For an NVP (or client) to learn about the current block, each VP when adding a block, broadcasts OpenchainMessage.CHAIN_NEW_BLOCK notification to all connected NVP streams, not VP streams since each VP already knows this via consensus.

Related topic #32

corecode commented 8 years ago

The consensus layer can provide a global hash that corresponds to the state the peer/validator needs to reach. This hash would be the f+1 accepted state. Does this sound like a good separation of concerns? Consensus asks for state to be established (by supplying target global hash), peer uses this information to synchronize.

kchristidis commented 8 years ago

Based on the discussion we had with @corecode earlier: we'll need the working consensus-independent sync protocol before we can do meaningful progress with #291. (We'll deviate from the corresponding section of the paper as it's too implementation specific, and this will also affect the timers that we need to put in place to ensure liveness.)

Posting this so that @ghaskins and @jyellick are in the loop, so that we minimize the time needed to plug things together.

binhn commented 8 years ago

As discussed today, the messages that we currently have are defined in openchain.proto OpenchainMessage with the following types:

CHAIN_SYNC_REQUEST: Starting sync protocol by broadcasting this message with payload contains the block height of the requester.
CHAIN_SYNC: Responding to CHAIN_SYNC_REQUEST with payload contains the block height of the responder, the current block number.
CHAIN_GET_BLOCKS: Picking from those responded CHAIN_SYNC, and request for blocks with this message. The payload contains a range (inclusive) of the first and last blocks to get.
CHAIN_BLOCKS: Responding to CHAIN_GET_BLOCKS message with payload contains the requested blocks or up to the current block that the responder has.
CHAIN_NEW_BLOCK: When a new block is committed, broadcast this message to the connected NVPs. The payload contains the delta of the state.
CHAIN_GET_STATE: An NVP requests for the current state.
CHAIN_STATE: Responding to CHAIN_GET_STATE message. The payload contains the snapshot of the current state.

srderson commented 8 years ago

In Ledger.go, to read the blocks/state the following functions can be used

GetBlockchainSize GetBlockByNumber GetStateSnapshot

I am currently working to expose raw PUT functions.

srderson commented 8 years ago

I created PR #335 for adding a raw block to the chain. The API in Ledger.go is PutRawBlock(block *protos.Block, blockNumber uint64) but I'm wondering if it should be PutRawBlock(block []byte, blockNumber uint64) ?

If we receive the blocks over the wire in raw bytes and they don't need to be converted to block structs, that would be faster, but I don't know the internals of gRPC well enough to know if it needs to be converted to a block struct first. @jeffgarratt maybe you have some input here?

srderson commented 8 years ago

Chatted with @jeffgarratt on slack. He suggested I stick with PutRawBlock(block *protos.Block, blockNumber uint64)

corecode commented 8 years ago

Will it be possible to request a specific global state hash to be synchronized to?

kchristidis commented 8 years ago

My question as well. As far as I can tell, GetStateSnapshot() returns the current (as of the time of calling) state.

srderson commented 8 years ago

Yes, it will only be possible to request the current state. We don't store the entire state for previous hashes. We do have deltas available using GetStateDelta(blockNumber uint64) in Ledger.go.

corecode commented 8 years ago

Would it be possible to keep around certain checkpoint states, or have a way to roll back to them? This is a requirement for PBFT.

srderson commented 8 years ago

Did you want to keep them on the blockchain or is it enough for a peer to store them off chain?

kchristidis commented 8 years ago

We do not care about the blockchain I think; storing them off chain should work.

srderson commented 8 years ago

Could you give me a scenario in which a checkpoint would be needed in PBFT so I can better understand its purpose? Is it just for auditing or is it necessary to rollback due to a fork or failed consensus?

corecode commented 8 years ago

We are implementing the TOCS version of PBFT. See Fig. 14 RECEIVE(<STATE,h,s>).

Periodic stable checkpoints (as confirmed by a quorum) are used to bring a fallen-behind replica up to speed. I don't know how we could select an arbitrary point in time (= current global state) and guarantee that this state is shared by a quorum.

kchristidis commented 8 years ago

But, to play devil's advocate for Sheehan: if the global hash is included the block, when you're returning the state that corresponds to this hash you're effectively returning a state that is shared by a quorum. (It wouldn't be on the blockchain otherwise.)

Is there a leap in the logic here?

corecode commented 8 years ago

How do you know you can trust the state/blocks returned by some other replica?

srderson commented 8 years ago

Here is my thought

Get the state snapshot from any peer. It will include the block number to which it corresponds. You don't need to trust this peer.
Get that block that corresponds to that state snapshot from 2f +1 peers. Look at the state hash in the block.
Calculate the state hash from the state retrieved in step 1. Assuming it matches the state hash from step 2, you know it is correct.

binhn commented 8 years ago

The way it is implemented, checkpoint = block, so the currently committed state is the checkpoint, and no past checkpoints kept around; however, that means a fallen behind validator would not have any other base to start from but the current state. It would either exec every TX from where it is at to the current or take the current state with all the blocks created in between because that is the current checkpoint.

I am not sure if it is necessary for us to keep past checkpoints around (including states and signatures) given that the blocks are cryptographically chained. On the other hand, I see 1 advantage of having some checkpoints around with a snapshot of the state: This would allow pruning block(1) up to a checkpoint (we can't prune block(0) as that's the genesis block).

In production, a checkpoint could be once a quarter or fiscal year.

We have an option for a new VP joining the chain to either start from block(1) or a checkpoint or the current block+state.

1 problem that I see with starting at a checkpoint or having pruned chain is that some stats would no longer be accurate, such as number of transactions, so I think we need to keep track of the cumulative total on the block

corecode commented 8 years ago

Thanks for the larger perspective view. Just to clarify: when I was talking about checkpoints, I was referring to PBFT checkpoints, which happen every K (e.g. 128) requests (which are Transactions at the moment).

corecode commented 8 years ago

Hope to get some input from @vukolic

vukolic commented 8 years ago

Before I can give any meaningful input - is there a design document which spells out how is state management implemented (and ideally, why)? We need to align consensus and state management design here (BTW, consensus design document is available to all and is the PBFT TOCS paper, with minor changes that are not relevant here).

srderson commented 8 years ago

Moved to sprint 3 based on discussion with Binh.

srderson commented 8 years ago

We don't have a ready to share design document describing how state management is implemented.

To give a very high level overview, the state is a key/value database. Chaincodes can only manage their own keys/values. If two different chaincodes use the same key, they are actually different keys in the underlying DB, but this is transparent to the chaincode. The state is modified by chaincodes when a transaction is invoked.

A block contains a series of transactions. After running all transactions, a hash of the state is taken and stored in the block. The hash could be used as part of consensus to ensure that all peers have the same state.

srderson commented 8 years ago

Moving to @jeffgarratt so he can implement SYNC_GET_BLOCKS and SYNC_BLOCKS as discussed today.

jeffgarratt commented 8 years ago

Latest PR for review by Binh includes processing of the SYNC_GET_BLOCKS and SYNC_BLOCKS. Currently the implementation is per Handler reference, and would need to be consumed by the Coordinator (TBD)

srderson commented 8 years ago

@jeffgarratt can this be closed now given commit 566a143d521430397eb5a568a7292ba346c4c03c or does that only handle snapshots but not individual delta requests?

jyellick commented 8 years ago

@jeffgarratt I'm not seeing a GetStateDelta function in the peer handler, is this work still outstanding?

kchristidis commented 8 years ago

I take it that #488 (now merged with the master branch) closes this issue? @jeffgarratt @jyellick

jeffgarratt commented 8 years ago

@kchristidis I believe this may be close to closing after commit 8492dc87e2e4c8aabb2738822dae905cd48203bd. @jyellick let me know if there are any remaining issues.

jyellick commented 8 years ago

@jeffgarratt @kchristidis

Pull request #493 completes this from the perspective of #291 .

Good to close for me.

hyperledger-archives / fabric

Implement block and state synchronization protcol #299