HotStuff extended for dynamic validator sets

karolinagrzeszkiewicz commented 6 months ago

This PR contains an implementation of our extension of the HotStuff protocol which can accommodate validator set updates. Listed below are some major features of the consensus protocol for validator-set-updating blocks which enable safe and live validator set updates.

Phased (non-pipelined) consensus with extra `Decide` phase

The consensus phases for a validator-set-updating block B proceed as follows:

"Prepare" phase: the leader broadcasts a proposal for B, replicas send Phase::Prepare votes.
"Precommit" phase: the leader broadcasts a Nudge with a Phase::Prepare QC for B, replicas send Phase::Precommit votes.
"Commit" phase: the leader broadcasts a Nudge with a Phase::Precommit QC for B, replicas send Phase::Commit votes.
"Decide" phase: the leader broadcasts a Nudge with a Phase::Commit QC for B, replicas send Phase::Decide votes.

The "decide" phase is special as it enforces a liveness-preserving transition between the two validator sets. Concretely, on seeing a commitQC new validator set becomes committed and its members vote "decide" for the block, with the goal of producing a decideQC. However, in case they fail to do so, the resigning validator set is still active and ready to re-initiate the "decide" phase by broadcasting the commitQC if needed - the resigning validators only completely de-activate themselves on seeing a decideQC for the block. This guarantees the invariant that if a decideQC exists, a quorum from the new validator set has committed the validator set update and is ready to make progress.

`lockedQC` to replace `locked_view`

As per the "safe QC" predicate from the HotStuff paper, a QC is safe if (1) it extends the lockedQC, or (2) it has a higher view than the lockedQC. (1) guarantees safety, and (2) guarantees liveness in case a quorum of replicas have switched to another branch.

For pipelined hotstuff, this is effectively equivalent to "a QC is safe if its view is higher or equal to the view of the lockedQC", hence previously we used locked_view = view of the lockedQC, without the need to keep track of the entire QC.

However, for non-pipelined HotStuff, a QC might be locked, but the consecutive view requirement may fail. In such case, we should not drop the lock (this could violate safety) but rather recognise a re-proposal of the block as "extending the lock". In essence, we are locking the block stored in the QC, but we store a QC in order to keep track of the view in which a QC for the block was collected.

Consecutive views requirement

The QCs for a validator-set-updating block must be collected in consecutive views. This extends the requirement from pipelined HotStuff to non-pipelined HotStuff. Otherwise, we may have branch switching that could lead to safety violations.

An implementation can be found in the safety module.

Keeping track of the validator set update status

We store a mapping from a block to its validator set update status. This is necessary for checking if the block has associated validator set updates, and if they have already been applied (useful among others for validating a QC, or for deciding whether updates need to be applied).

pub enum ValidatorSetUpdatesStatus {
    None,
    Pending(ValidatorSetUpdates),
    Committed,
}

Storing the history of validator set updates

At any point, we only store the committed validator set, and the previous validator set, together with the information on whether the update has been completed, and the height of the validator-set-updating block. An update is completed when a decideQC for the validator-set-updating block is seen.

When update is not completed, the previous validator set is partially active, ii.e., can particvipate in timeout voting, and nudge the commitQC. This ensures the liveness of the protocol in case the new validator set is not active yet.

pub struct ValidatorSetState {
    committed_validator_set: ValidatorSet,
    previous_validator_set: ValidatorSet,
    update_height: Option<BlockHeight>,
    update_completed: bool,
}

Storing the previous validator set, and the update height, even after the update is completed, is important for validating QCs. Concretely, given a QC we need to know which validator set to verify the signatures against. This seems to require a mapping from a block to the validator set in charge of validating it. However, we claim that in a correct execution of the protocol, a replica will never need to validate QCs older than those that are meant to be signed by the previous validator set, since older Qcs should already be committed. Hence, the replica only needs to remember the current (committed) and the previous validator set, rather than an entire history of validator sets.

karolinagrzeszkiewicz commented 6 months ago

Regarding the two cases where a precondition needs to hold for a method to be called:

on_receive_view_info: I kept the method implementation as it is, but renamed it to enter_view to avoid confusion with the methods that are triggered on receiving a message. I also updated the precondition to be a a method call is_view_outdated returning true. This is clearly described as a precondition for calling on_enter_view in the doc for enter_view.
Collectors::update_validator_sets: now includes a check equivalent to what used to be Collectors::should_update_validator_sets, and returns true if the collectors have been updated on calling the method, false otherwise.

lyulka commented 4 months ago

TODO for tomorrow:

Write better one-sentence descriptions for the kv_store and write_batch modules.
Write "internal procedure" itemdoc sections for on_receive_proposal, on_receive_nudge, on_receive_vote, and on_receive_new_view.

parallelchain-io / hotstuff_rs