parallelchain-io / hotstuff_rs

Rust implementation of the HotStuff consensus algorithm.

34 stars 4 forks source link

Tracking Issue: HotStuff-rs v0.4 #25

Open lyulka opened 7 months ago

lyulka commented 7 months ago

Changelog

HackMD (Replication Section).

Related issue

parallelchain-io/parallelchain-protocol#10

lyulka commented 6 months ago

Summary of design meeting (December 21, 2023)

Validator Set Speculation

We discussed potential designs for validator set speculation (a solution to issue #16). Our potential designs are essentially variations of adding extra phase(s) to the "phased protocol" to disseminate CommitQC to a quorum of the old validator set and the new validator set before proceeding to produce blocks using the new validator set.

A couple questions were brought up about the potential design:

Do we need an all-to-all broadcast to disseminate DecideQC, or could we get by without an all-to-all broadcast if we do not forget the Resigning Validator Set? Alternatively, can we replace the all-to-all broadcast with another phase (called "finalize")?
Do we really need votes from the previous validator set in DecideQC (in addition to votes from the new validator set) for safety and liveness? Or, can we make do with a DecideQC containing votes only from the new validator set.

Byzantine View Synchronization

Finally, we agreed that in HotStuff-rs v0.4, any block should be able to change the validator set (i.e., we don't have to wait an "epoch" to change the validator set). This brings challenges to view synchronization.

Agenda for meeting tomorrow

Present revisions to View Synchronization Design.

lyulka commented 6 months ago

Summary of design meeting (December 22, 2023)

Byzantine View Synchronization

We decided in previous meetings to use the Lewis-Pye algorithm as a basis for our implementation of Byzantine View Synchronization.
However, the Lewis-Pye algorithm assumes fixed validator sets, and HotStuff-rs offers dynamic validator sets, so we've discussed making two changes to the Lewis-Pye algorithm to make it work with dynamic validator sets:
1. Epoch length is to be a constant configurable parameter instead of "F + 1", since F can change, and if different replicas have a different local value for "F" (because their view of the current validator set changes at different times), this will cause different replicas to have different epoch lengths and this prevents synchronization.
2. At the "Epoch Change View" replicas are to broadcast votes/new-view messages to all replicas, instead of just the F + 1 leaders of the epoch.

Implementation of Validator Set Speculation

We identified three scenarios of how a replica could act in "speculation periods":
1. It is a leader in the previous validator set and a leader in the next validator set.
2. It is a replica in the previous validator set and a replica in the next validator set.
3. It is a leader in the previous validator set and a replica in the next validator set or vice versa.
We note that in each scenario, a leader only needs to produce at most one proposal, and a replica only needs to validate at most one proposal. This is an improvement on "general speculation", where leaders and replicas may have to produce multiple proposals or validate multiple proposals in a single view, which infeasible time-wise.

lyulka commented 6 months ago

Summary of design meeting (January 3, 2024)

Validator Set Speculation

We revisited the question of whether the Decide QC needs to contain a quorum of votes from both the resigning validator set and the incoming validator set. We narrowed down the requirements to be the following:

In order for a validator set update to be safe it needs to be agreed upon by the previous validator set.
In order for a validator set update to be live it needs to be agreed upon by the incoming validator set.
In order for all replicas, including listeners, to be able to grow its local copy of the blockchain, they need to be able to validate, in sync, blocks following validator set changing blocks.

In light of these requirements, we considered a new alternative to having "double-sized" DecideQCs: Commit QCs will contain votes from only the resigning validator set, Decide QCs will contain votes from only the incoming validator set, but blocks following validator set changing blocks will contain a "Merged QC", which includes both a Commit QC and a Decide QC.

This design has a couple benefits over requiring "double-sized" QCs, both benefits arise from the fact that the resigning validator set does not have to participate in the Decide phase:

It reduces message complexity.
A replica can end its own speculation period immediately upon seeing a Merged QC, instead of having to wait for one more QC.

lyulka commented 5 months ago

Summary of design meeting (January 8, 2024)

Validator Set Speculation

We identified a liveness problem with the Merged QC design: consider the scenario where a quorum of replicas in vs' receive MergedQC but only one replica in vs receives both CommitQC and MergedQC. This replica in vs will immediately "resign" (i.e., stop nudging CommitQC and voting Decide). If f replicas in vs are byzantine, this leaves 2f < 2f+1 replicas in vs that have not yet resigned. This is not enough to produce a new Commit QC. Therefore, in order for the 2f replicas in vs to sync to the head of the blockchain (which is now being grown by vs'), it needs to sync with a replica in vs'. However, because the sync server is only selected from inside the committed validator set, this is not possible.

The Double-Decide QC design avoids this problems by making it such that DecideQC guarantees that a quorum of replicas in vs have committed vs'. So even the at most f replicas in vs that fail to receive CommitQC and therefore commit vs' can still reliably select a sync server from inside vs that is aware of CommitQC.

We identified ways we could modify the Merged QC design to sidestep the liveness problem, e.g., keeping track of a Locked Validator Set and selecting sync servers from both cvs and lvs, but the current thinking is that this is more complex than just going with Double-Decide QC design.

Byzantine View Synchronization

We discussed adding a highest_qc_or_tc field to the EpochTimeout type. This helps guarantee liveness in the case that more than f but less than 2f+1 replicas are in an epoch e + 1 and likewise more than f but less than 2f+1 replicas are "left behind" in epoch e by creating a means by which the replicas in epoch e + 1 can bring the left behind replicas up to epoch e + 1.

Action points

@karolinagrzeszkiewicz will update the ParallelChain Protocol v0.6 changelog to reflect the current design.

karolinagrzeszkiewicz commented 5 months ago

Summary of Design Meeting (January 15, 2024)

Liveness issue with commit rule for validator-set-updating blocks

We discussed a liveness issue with applying the "three consecutive views" commit rule to the phased consensus mechanism for validator-set-updating blocks. In short, if a replica rejects a commitQC for a block because it does not satisfy the commit rule, it will be unable to broadcast an acceptable nudge or proposal moving forward, since it holds a precommitQC as its highest_qc, and the its view has advanced beyond precommitQC.view + 1. Even if a conflicting block is proposed, it will be rejected because of the locked_view.

We discussed the following solutions to the above-mentioned issue:

"1 leader, 4 phases, 1 view" consensus for validator-set-updating blocks, like in the traditional phased HotStuff SMR.
Introducing the option to extend the branch led by a validator-set-updating block even if the block does not satisfy the commit rule, and possibly committing the entire branch later if a higher block in the branch satisfies the commit rule (analogously to what we do in the chained HotStuff implementation for non-validator-set-updating blocks).
Liveness fallback mechanism in case a proposal for a validator-set-updating block fails to satisfy the commit rule: broadcast another (or the same) proposal with the highest_block_justify as the proposed block's justify.

We chose 3, since 1 may complicate view sync (since views in which a validator-set-updating block is proposed require more time), and 2 fails to guarantee immediacy.

We also discussed how the traditional HotStuff, whether pipelined or not, does not guarantee immediacy, presumably because traditional SMR concerns state updates that have to do with the application, rather than the protocol (like validator set updates). We think that solution 3, which effectively blocks on a certain block height until either (1) a non-validator-set-updating block obtains a genericQC, or (2) a validator-set-updating block obtains a prepareQC, precommitQC, and a commitQC, may offer a safe and live alternative to the traditional flow of the HotStuff consensus protocol.

Action Points

I will develop the idea behind solution 3 further, thoroughly analyze its safety and liveness features, and work on adapting to to our Validator Set Speculation Design.

lyulka commented 5 months ago

Summary of Design Meeting (February 2, 2024)

Validator Set Speculation

We discussed the protocol specification draft of HotStuff-rs with validator set speculation.
We discussed the possibility of combining the state variables locked_vsu_block and highest_view_in_which_locked_vsu_block_was_proposed into a single state variable: locked_prepare_qc.
Further, we discussed the possibility of combining locked_prepare_qc and locked_view into a single state variable: locked_qc.
We noted that currently, determining whether a block is validator-set-updating requires handing over the block to the app to validate it, and this happens after the block is subjected to safety-checking logic. But because the safety-checking logic in the on_receive_proposal in the draft depends on whether the block is validator-set-updating or not, validation of the block needs to happen before the block is subjected to safety-checking logic in the new version of the protocol.

lyulka commented 4 months ago

Summary of Design Meeting (February 19, 2024)

We considered the choice between a single-Decide QC design and a double-Decide QC design again. We had initially leaned towards a double-Decide QC design (see design meeting summary from January 8, 2024) because it mitigates limitations with the current mechanism for choosing a replica to use a sync server and prevents it from hurting liveness too much. We discussed other ways we could mitigate this limitation while sticking with a single-Decide QC design, which we deem to be more optimal and elegant compared to the double-Decide QC design:
- E.g., @karolinagrzeszkiewicz proposed that we could add a highest_commit_qc field to the AdvanceView message type.
- Alternatively, if we really want to have a single-Decide QC design, we could design a new protocol for choosing a replica to use as a sync server that makes it so that replicas other than those in our committed validator set can be selected as a sync server.
We agreed that a single-Decide QC design is safe, and the main challenge with this design is its liveness characteristics.

lyulka commented 4 months ago

Summary of Design Meeting (February 23, 2024)

This was an overview meeting where we collected ourselves and summarized everything we've agreed to about the design of HotStuff-rs v0.4. These points are summarized in the three sections below.

HotStuff

Validator set speculation:
1. New kind of QC: Decide QC.
2. Decide QC contains votes only from the new validator set.
3. New state variable: Resigning Validator Set:
4. Members of the Resigning Validator Set only have three responsibilities:
  1. If they become the leader in a view, they should broadcast Nudge(CommitQC).
  2. At the last view in an Epoch, they should broadcast a TimeoutVote.
  3. At the last view in an Epoch, they should collect TimeoutVotes to form TimeoutCertificates and broadcast AdvanceView(TimeoutCertificate).
  4. The Resigning Validator Set state variable is deleted upon receiving a Decide QC.
Adapting HotStuff to enable consensus on validator sets: a. New state variable: Locked QC. b. Removed state variables: Locked View.

Pacemaker

New message types: a. TimeoutVote. b. AdvanceView.
New certificate type: TimeoutCertificate.

Sync

Sync Trigger a. On seeing a correct QC from the future for an unknown block. b. On entering the epoch view (Can be removed if we include "on lack of progress" sync). c. On lack of progress (To enable a resigning validator set to sync?).
Sync a. TODO: sync mechanism that enables syncing with a random peer (not just a validator) but is not vulnerable to DoS.

Modularity

TODO: hotstuff-rs design that decouples its separate components as much as possible and reasonable. The three components are the HotStuff implementation, the Pacemaker, and the Sync Trigger. The decoupling can be trait-based or thread-based.

lyulka commented 4 months ago

Summary of design meeting (February 27, 2024)

We compared and contrasted the two implementation sketches we’ve coded and published in parallelchain-io/hotstuff_rs_v04_sketches, listing down the points of contention.
We reached a conclusion for all of the points of contention, enabling us to start the implementation phase proper for HotStuff-rs v0.4.
We agreed on a set of milestones for the implementation phase of HotStuff-rs v0.4. Each milestone will correspond to a PR and achieve specific goals.

Implementation sketches

Points of contention

Major: we prefer the implementation of execute in the “/alice” crate because unlike in the other implementation sketch, the execute function only calls Pacemaker tick once every iteration.
Minor: whether algorithm should be aware of the specific message variants of each of the protocol, or wrap the variants in HotStuffMessage, BlockSyncMessage, PacemakerMessage. (agreed)
Major: recv should only be called in execute, and not in the any of the protocol-specific methods. The protocol-specific methods should only send. (agreed).
Major: all "on_*" methods should only modify the internal state of the specific protocol. It should not return any values. In particular, Pacemaker on_receive_timeout_vote and on_receive_advance_view should have return values. The Algorithm thread will find out about the changes to the Pacemaker struct on calling some kind of query method. (agreed).
Minor: all methods on HotStuff, Pacemaker, and BlockSyncClient that receives messages that mutates the internal states of their respective protocols should have a name that starts with "on_*". (agreed).

Further Ideas

If a leader in a view proposes to us a block that is invalid, we should not validate any more proposals from that leader in this view. This prevents byzantine leaders (especially byzantine leaders of epoch-change views) from DoS-ing replicas by spamming them with proposals.
- This obviates the need for a view_sync_mode state variable in the Algorithm struct.
We could remove the coupling between Pacemaker and Block Sync Trigger by triggering block sync upon seeing an AdvertiseBlock with a Block containing a correct QC "from the future", instead of triggering block sync upon seeing an AdvanceView message containing a correct QC from the future.

Unknowns

Besides the event-based block sync trigger described in "Ideas no. 2", we also need a timeout-based sync trigger to fall back on, particularly because we have validator set updates and a replica with a lagging view of the validator set will not be able to validate QCs from the future. We initially intended to trigger block sync at the end of every (Pacemaker) Epoch, but a timeout-based trigger that is triggered upon "lack of progress" would be preferable and further decouple Pacemaker and Block Sync. The details of how this can be implemented is still not fully fleshed out, but we imagine that this would entail the addition of a tick method to the BlockSyncClient struct that checks "progress".

Schedule

We will implement HotStuff-rs v0.4 in a new branch (called "hotstuff_rs_v04"). Development will be split into at least 4 milestones, each milestone corresponding to a PR into the "hotstuff_rs_v04" branch. These milestones will be tracked using the "HotStuff-rs v0.4" GItHub Project.

Milestone 1: Define Types and Signatures; Empty Bodies.
Milestone 2: Write Bodies of Pacemaker Methods.
Milestone 3: Write Bodies of HotStuff Methods.
Milestone 4: Write Bodies of Block Sync Methods.

At the end of development, we will write a user-readable changelog for the v0.4 release.

karolinagrzeszkiewicz commented 3 months ago

Summary of design meeting (March 7, 2024)

We discussed the following design sketch for new block sync.

We identified the following 3 components of the block sync client design:

List of sync servers: How to maintain a list of available sync servers? (new component)
Sync mechanism: How to sync, once sync is triggered and given the list of available servers?
Sync trigger mechanism: When to trigger sync?

The key idea that links components 1 and 2 is that an AdvertiseBlock message broadcasted periodically by the sync server serves as a commitment to providing blocks (at least) up to the advertised block, and a sync server can be punished for breaking that commitment by being blacklisted.

In short, we would like to punish servers for sending incorrect blocks and for not sending enough blocks. Ideally, we would like to avoid punishing them for not responding within a given amount of time since this can be due to asynchrony or a benign fault. This is why the design above sends a sync request to all peers, and syncs with the first server that responds, avoiding the problem altogether.

The following issues, questions, and suggestions were raised (and should be addressed by the updated design):

Should the height of the newest block (most recently inserted block), highest_qc.block, highest block or highest committed block be advertised by a sync server? Newest block height and highest_qc.block.height are not monotonically increasing because of branch switches. Hence, if an honest server advertises a block, but then the quorum switches to a lower branch and it provides the blocks in the lower branch via BlockSyncResponse, the server will break its commitment and be incorrectly penalized.
Perhaps AdvertiseBlock should be split into AdvertiseBlock and AdvertiseQC, the latter containing the highest_qc required for sync trigger. This is to avoid overloading AdvertiseBlock.
There is a possibility of uncontrolled growth of available_servers and black_list. We need to make sure that their sizes are bounded. The associated risks include overflow and byzantine peers overpopulating the available_servers.
Perhaps the blacklist should not be permanent. We can store the timestamps associated with blacklisting a peer in the blacklist, and once the timestamp expires we can remove the peer from the blacklist.
sync_request_limit may differ across sync servers and clients. Hence, in the current design sketch a sync server can be incorrectly punished when it doesn't send as many blocks as it promised but in fact it is because its sync_request_limit is lower than the client's.
Consider changing the meaning of block_sync_timeout: there should not be a limit to how long the syncing process can take, new validators or listeners may need to sync millions of blocks, instead block_sync_timeout should mean the maximum time the client waits for the server's response.

karolinagrzeszkiewicz commented 3 months ago

Summary of design meeting (March 8, 2024)

Sync server selection

We discussed the updated design for new block sync, where the newest_block_height field of the AdvertiseBlock message is replaced with highest_committed_block_height (which is monotonically increasing), and selecting a new server for every sync request is replaced with selecting the first peer that responds to the first sync request as the sync peer for all subsequent requests. The commitment is also redefined: by sending highest_committed_block_height in AdvertiseBlock a sync server commits to providing at least advertise_block.highest_committed_block_height - self.highest_committed_block_height.

Regarding blacklisting, we concluded that blacklisting servers that do not respond in time may be desirable, since slow servers are not the best sync peers either (not just byzantine servers).

However, we also found that blacklisting may be of little use in protecting against an attacker than can keep adopting new identities (sybil attack), and currently there is nothing in the protocol that prevents them from doing that.

We also discussed the choice between selecting the available sync server that responds first vs. selecting a sync server at random. We found that both approaches expose a significant attack surface:

Sync server that responds first: means that attackers that respond with very few or fake blocks have a higher chance of being selected, since their response message will get to the client faster.
- To address this we could try to find a way to make it more expensive to produce responses, and/or to reject such responses faster, i.e., without the need to validate the blocks,
- Even if we retry selecting another server on discovering an attacker server, we will face the same problem when selecting the fallback sync server, and trying to find a fallback multiple times might take too much time and impede the progress of the validator node,
- Perhaps we could consider using a slow hashing function, and validating the hashes before calling the app to validate the blocks.
Random sync server: means that for a successful attack the attacker can control a majority of available_servers by regularly sending AdvertiseBlock from tons of fake addresses (sybil attack) and hence increase its chance of being selected.
- As above, resolving this requires making it expensive to advertise blocks.

To address this, we considered the following ideas, of which 1 is clearly the safest:

Select servers from the candidate, committed, and old validators, i.e., add valid sync peers on executing ablock with validator set updates.
- Clearly prevents sybil attacks,
- Yet requires adding a manual sync trigger. This is because on starting a new replica the committed validator set may be very outdated and not overlap with the committed validator set according to the head of the chain.
- Note that when a replica does not know the most recent validator-set-updating block that just got committed, its ability to sync is contingent on whether the old validators are still active.
App API for obtaining identities of all participants in the network that have a stake (or a pool) in the protocol, and selecting available sync servers exclusively from those.
- Issue: same as above, by being behind on the blockchain state, a replica's App is also behind on the status of the participants in the network, so it may not know of the participants valid for the head of the blockchain.
- Hence, also requires manual sync trigger.
- Yet allows listeners to act as sync servers.
Hybrid approach: make sure that a fixed percentage of the available servers are from the past/current/candidate validators.
- Complicated, is it worth it?

The above concerns bring up a broader question of who should be able to act as a sync servers. Is it fair to let only past/current/candidate validators act as sync servers? Or should we let any reliable participant in the network act as a sync server?

Block sync trigger

We agreed on the following block sync trigger mechanism:

Event-based sync trigger: on receiving a cryptographically correct qc for an unknown block and qc.view >= cur_view via an AdvertiseQC{qc} message from some sync server.
Timeout-based sync trigger (as a fallback): on not making progress for sufficiently long time, where progress = "a sync attempt or updating the sync client's internal highest_qc" (same rules for highest_qc update apply as in the HotStuff protocol).

karolinagrzeszkiewicz commented 3 months ago

Summary of design meeting (March 13, 2024)

We talked about PR #31 which introduces the new code organisation for hotstuff-rs 0.4. We reached the following conclusions regarding changes to the PR:

We will not add a blanket implementation for Collector::collect and the Collector trait shall be removed since it doesn't really do anything useful.
Certificate::quorum shall be a method of the ValidatorSet instead (ValidatorSet::quorum), and the Certificate trait shall be removed too.
TimeoutCertificate and QuorumCertificate should be moved to pacemaker::types and hotstuff::types respectively.
We should introduce newtypes to replace type aliasing, and make conversions and referencing less cumbersome by implementing traits such as TryFrom and AsRef for these newtypes.

We also agreed that block_tree shall be passed to HotStuff, BlockSyncClient and Pacemaker methods (as it is now), rather than being stored as a field of these structs. This is because the only smart pointers that could stored in those fields are the thread-safe pointers like Arc, because spawning requires the moved data types to implement Send, and Arc is expensive and not shared between different threads in our case.

karolinagrzeszkiewicz commented 3 months ago

Summary of design meeting (March 14, 2024)

In this meeting we finalized the high-level design of the new block sync mechanism, concretely we decided that:

Definition of "trustworthy sync peers": At any point in the protocol, the set of "trustworthy peers" from which we accept AdvertiseBlock and AdvertiseQC messages shall consist of: (1) the committed validator set, and (2) new validators associated with any speculative block in the block tree.
Sync server selection: the trustworthy peers that send correctly signed AdvertiseBlock messages shall be stored in available_sync_servers (until some expiry date from sending the message). This way we only consider the peers that ntoify us about their availability to be available. As discussed earlier, blacklisted peers should be periodically removed from this set.
Blacklisting: blacklisting shall not be permanent, every blacklisting should be associated with an expiry date. This is necessary since we blacklist peers for not responding to sync requests.
Selecting a peer to sync with: sample a random peer from available_sync_servers, instead of syncing with the first peer to respond.
AdvertiseBlock vs AdvertiseQC: we need both, the former for notifying about server availability and commitment to providing blocks up to a given height, and the latter for block sync trigger.
Manual sync trigger: required for starting a new node after there have been updates to the validator set of the blockchain. We don't have the details yet, but we imagine a start_with_sync API for the Replica in addition to the start method, or a sync function which just like initialize shall be called before start.

Unrelated to sync, but Alice raised a point that ReplicaSpec::initialize should be a method of KVStore rather than ReplicaSpec, since it initializes the key value store.

I will also update the design in the hack md doc to reflect the above changes.