parallelchain-io / hotstuff_rs

Rust implementation of the HotStuff consensus algorithm.
34 stars 4 forks source link

Safety and liveness problems stemming from deleting pending validator set updates on commit #16

Open lyulka opened 9 months ago

lyulka commented 9 months ago

Background

The current implementation deletes pending validator set updates on commit. This ultimately causes BlockTree::safe_qc and/or QuorumCertificate::is_correct to evaluate to true when we really want it to evaluate to false, and to evaluate to false when we want it to evaluate to true.

Safe QC

pub fn safe_qc(&self, qc: &QuorumCertificate) -> bool {
    /* 1 */
    (self.contains(&qc.block) || qc.is_genesis_qc()) &&
    /* 2 */ 
    qc.view >= self.locked_view() &&
    /* 3 (Problem 1) */ 
    (((qc.phase.is_prepare() || qc.phase.is_precommit() || qc.phase.is_commit()) && self.pending_validator_set_updates(&qc.block).is_some()) ||
    /* 4 (Problem 2) */
    (qc.phase.is_generic() && self.pending_validator_set_updates(&qc.block).is_none()))
}

QuorumCertificate::is_correct

pub fn is_correct(&self, validator_set: /* (Problem 3) */ &ValidatorSet) -> bool {
    self.hash == Block::hash(self.height, &self.justify, &self.data_hash)
        && self.justify.is_correct(validator_set)
}

Problems

Problem 1 (Liveness)

Acceptable nudges on a validator-set-changing block will fail safe_qc predicate 3 if the block has been committed, since the pending validator set changes of a block are.

Problem 2 (Safety)

Proposals containing a generic QC that extend a validator-set-changing block will pass predicate 4 of safe_qc and be accepted if the block has been committed, again, since the pending validator set changes of a block are deleted on commit.

Problem 3 (Safety and Liveness)

QCs in blocks that extend validator-set-changing blocks are formed by the previous validator set.

Consider a scenario in a replica that has just inserted a block containing a commit QC, that is, the block this block extends is committed by the insertion.

Imagine that after this, the replica receives a proposal containing a new block but which contains the same commit QC. In this case, The replica will try to validate the commit QC on its current committed validator set, which is the new validator set instead of the previous validator set because it has committed the parent block. A similar scenario could also happen with nudges.

This mistake opens the door to safety and liveness violating scenarios:

  1. Liveness: because proposals and nudges containing QCs produced (correctly) by the previous validator may fail to pass BlockTree::is_correct evaluated on the new validator set.
  2. Safety: because proposals and nudges containing QCs produced (wrongly) by the new validator set may pass BlockTree::is_correct.
karolinagrzeszkiewicz commented 4 months ago

More on problem 1 (pending_validator_set_updates)

First, a part that is easier to fix: in this line block should be replaced with b. Note that this line is incorrect with respect to our intention, but in reality it should not cause much trouble, since among the blocks that are committed with a single call to commit_block, only block can have associated validator set updates.

Secondly, even with the fix mentioned above we can face a possibly liveness-threatening situation. On committing a block b, the pending validator set updates associated with b are deleted from the BlockTree. When a child block of b is proposed, safe_qc returns true, since the justify is a commit QC and there are validator set updates associated with b in the BlockTree. Then b is committed and its pending validator set updates deleted. However, if another child block of b is proposed, the safe_qc (and hence safe_block) check will fail because the validator set updates associated with b have already been deleted.

For reference:

pub fn safe_qc(&self, qc: &QuorumCertificate, chain_id: ChainID) -> bool {
        /* 1 */
        (qc.chain_id == chain_id || qc.is_genesis_qc()) &&
        /* 2 */
        (self.contains(&qc.block) || qc.is_genesis_qc()) &&
        /* 3 */ qc.view >= self.locked_view() &&
        /* 4 */ (((qc.phase.is_prepare() || qc.phase.is_precommit() || qc.phase.is_commit()) && self.pending_validator_set_updates(&qc.block).is_some()) ||
        /* 5 */ (qc.phase.is_generic() && self.pending_validator_set_updates(&qc.block).is_none()))
    }

In case the first child of b that was proposed cannot be committed, for instance because the proposal was only broadcasted to selected validators by a byzantine node, the validators that have inserted this block cannot insert any other conflicting block, hence this is a potential liveness threat.

Suggested solution to problem 1

Introduce a new type, such that instead of Option<ValidatorSetUpdates>, a block hash should map to this type (or Option of this type):

enum ValidatorSetUpdatesInBlock {
    NoUpdates,
    Pending(ValidatorSetUpdates),
    Committed(ValidatorSetUpdates),
    Forgotten,
}

Once we no longer need to verify QCs for a given validator-set-updating block, its associated validator set updates can be Forgotten.

Note on problems 2 and 3

Problems 2 and 3 shall be solved with the new consensus protocol for dynamic validator sets (hotstuff-rs 0.4).