parallelchain-io / hotstuff_rs

Rust implementation of the HotStuff consensus algorithm.
34 stars 4 forks source link

Put message containing “QC from the future” into the message buffer #12

Closed lyulka closed 6 months ago

lyulka commented 10 months ago

Background

@AlvinHon observes that ParallelChain protocol validators often reach a steady state where it lags 1-2 blocks behind the other validators and never gets to participate in consensus decisions (it never votes).

Problem

After some sequence diagram analysis, we identified that this is caused by a scenario where:

  1. A validator’s (“lagging validator”) blockchain and cur_view is lagging behind the quorum.
  2. It receives a proposal with block.justify.view > cur_view, causing ProgressMessageStub::recv to return with ReceivedQCFromFuture and the lagging validator to go into sync.
  3. The lagging validator sends a SyncRequest to an up-to-date validator. However, at the point of receiving the request, the up-to-date validator is still executing validate_block on the same proposal and therefore block has not been inserted into its block tree. Thus, the SyncResponse it sends only includes the chain up to block’s parent (parent_block).
  4. The lagging validator exits sync and re-enters progress mode at view parent_block.justify.view + 1, which in the steady state is the view parent_block was proposed in.
  5. An up-to-date validator finishes validating block, moves on to the next view, becomes the leader; and sends out a proposal containing child_block.
  6. The lagging validator receives this new proposal, but since child_block.justify.view > parent_block.justify.view + 1, it complains that it ReceivedQCFromFuture and re-enters sync mode again. And the cycle repeats.

Proposed solution

The proposed solution makes three changes:

  1. Enter progress mode at highest_qc().view + 2 instead of highest_qc().view + 1. This makes more sense because in the steady state, the highest QC’s view is the view the highest known block’s parent was proposed in, highest_qc().view + 1 is the view the highest known block’s parent was proposed in, and highest_qc().view + 2 is the view the next block should be proposed in.
  2. Put the message which triggers a ReceivedQCFromFuture into the ProgressMessageStub msg_buffer instead of throwing it away by returning an error immediately.
  3. Return ReceiveQCFromFuture if block.justify.view == cur_view.

The first change causes the lagging validator to re-enter progress mode at the view block was proposed. The second change makes it such that the lagging validator will recv the proposal containing block in this view and insert block into the block tree. The cumulative effect is that when the lagging validator receives child_block, it will pass safe_block and be vote for it.

The third change is not strictly necessary for this fix, but is reasonable because block.justify.view == cur_view if and only if a consensus decision has already been reached in cur_view.

lyulka commented 6 months ago

Implemented in HotStuff-rs v0.3.