parallelchain-io / hotstuff_rs

Rust implementation of the HotStuff consensus algorithm.
34 stars 4 forks source link

hotstuff-rs-0.3 #18

Closed karolinagrzeszkiewicz closed 7 months ago

karolinagrzeszkiewicz commented 9 months ago

This commit includes the following:

  1. Upgrade to ed25519-dalek version 2.0.0 and related shift from exposing PublicKeyBytes as validator identifiers to using PublicKey (ed25519-dalek::VerifyingKey) instead
  2. A non-stack-overflowing version of commit_block (in response to Issue #2)
  3. Fixed pending_validator_set_updates (Issue #15)
  4. Minor bug fix in logging.rs, the bug appearing due to PublicKeyBytes and CryptoHash being type aliases for the same type

Comment on (2): The previous version of commit_block was non-tail-recursive, which means potential stack overflow as more and more stack frames are added and there is nothing the compiler can do about it. In some other languages, implementing a tail-recursive version instead would avoid the stack overflow problem and be optimised by the compiler to performance equivalent to that of a loop (Tail Call Optimisation). However, Rust does not implement Tail Call Optimisations,. Moreover, it has iterators (and closures) which are marginally faster than loops, and folding over an iterator is logically equivalent to tail-recursion. Hence, my solution involves folding over an iterator with a closure that updates the state thereby committing a corresponding block, from oldest blocks to newest blocks.

karolinagrzeszkiewicz commented 8 months ago

This merged PR introduces the event handling feature for hotstuff-rs, as well as logging rewritten as a set of possible default event handlers. The major contributions include:

  1. events.rs: definitions of key blockchain and protocol events
  2. event_bus.rs: EventHandlers struct for storing handlers for each event type, including a method to add default logging handlers, event_bus thread for receiving events from algorithmand sync_server threads and triggering the execution of appropriate handlers stored in its instance of the EventHandlers struct
  3. logging.rs: Logger trait with a required method get_logger() which returns an object's default handler for logging, implementations of the Logger trait for all event types from events.rs
  4. replica.rs: log_events: bool and a vector of handlers for each event type passed as parameters to Replica.start(), creating an EventHandlers struct based on provided handlers and log_events, if the struct instance is not empty then Replica.start() launches an extra thread event_bus for triggering the execution of event handlers upon receiving events
  5. algorithm.rs, state.rs, sync_server.rs: relevant functions now publish events by sending them on the event_publisher end of the channel. The events are published after the corresponding action is completed, in particular events that modify the block tree are only published after the event takes effect, i.e, the change is written to the block tree (rather than a write batch)
karolinagrzeszkiewicz commented 8 months ago

The PR merged above implements a builder pattern for replica, now a replica can be started as follows:

let replica = 
            ReplicaBuilder :: builder()
            .app(app)
            .pacemaker(pacemaker)
            .network(network)
            .kv_store(kv_store)
            .configuration(configuration)
            .on_insert_block(on_insert_block)
            .on_commit_block(on_commit_block)
            //...
            .build()
            .start();

Where on_<event> setters are optional, but if used they should be passed arguments of type HandlerPtr<event_type>, e.g. the on_insert_block setter takes an argument of type HandlerPtr<InsertBlockEvent>, which is the same as Box<dyn Fn(&InsertBlockEvent) + Send>. Therefore, the argument must be a pointer to the closure, rather than the closure itself (the popinter can be created with Box::new(handler) where handler is a closure that implements Fn(&event_type)). This is because Fn(&T) is a trait which can be implemented by different types thus with size unknown at compile time, and so cannot be stored in a struct's field which should have a size known at compile time.

Note that even though we pass only one closure pointer per event, the closure can in fact be defined to combine multiple different closures. Hence, passing one closure can be equivalent to passing multiple closures.

However, internally the handlers for each event are stored as a vector Vec<HandlerPtr<event_type>> which can include (1) (optionally) a custom handler passed to the replica builder, and (2) (optionally, if log_events is set to true in configuration) a predefined log handler for the event (from logging.rs). This is to avoid combining the user-provided closure with the logging closure by dereferencing ("unboxing") the user-provided closure, just to "box" the resulting closure again after combing the functionalities of the two.

Other changes:

  1. Modified tests to make them compatible with the ed25519-dalek update, event handlers and the builder pattern for replica (this required including the rand_core feature for ed25519-dalek in Cargo.toml, and adding rand_core to dependencies)
  2. Minor fixes, switch to using log::info in all loggers in logging.rs
  3. Removed chain_id method from App (now this is passed via Configuration)
karolinagrzeszkiewicz commented 8 months ago

Implemented the program behavior for the following parameters, now passed via Configuration:

  1. sync_request_limit, sync_response_timeout: now the config parameters replace the corresponding methods of the Pacemaker trait
  2. progress_msg_buffer_capacity: now this config parameter is stored as a field of ProgressMessageStub (as msg_buffer_capacity), along with msg_buffer_size which keeps track of the number of bytes of the values stored in the msg_buffer. To avoid overloading the buffer, now the recv method of ProgressMessageStub checks if adding a given message to the buffer would result in overload, and if so it either drops the message (if it is of a higher or equal view than that of a highest-viewed message in the current buffer) or (otherwise) removes just the number of highest-viewed messages in the buffer that need to be removed to make space for the new message.
karolinagrzeszkiewicz commented 8 months ago
  1. Addedvalidate_block_for_sync method in App, so that apps can distinguish between validate block requests in sync vs. progress mode
  2. Added chain_id to Configuration, removed from App
  3. Now safe_qc takes replica's chain_id as an extra parameter and checks whether qc.chain_id == chain_id
karolinagrzeszkiewicz commented 8 months ago

Some thoughts on implementing sync_trigger_timeout:

If no "progress" happens for the duration of time defined by t = sync_trigger_timeout, we'd like the replica to trigger sync. For this we need to define progress, or find an event such that if the event doesn't happen for t then this means no progress happened for t.

Claim: "progress" during t implies updating highest_qc (at least once) during t (.i.e. highest_qc not updated during t implies no progress during t, in which case we should trigger sync)

Proof sketch: If I (replica) haven't updated my highest_qc for big enough t, then one of the following must be the case:

  1. I was offline, or malfunctioning, or not receiving messages, in which case there was no progress
  2. My highest_qc is for a higher view than all of the highest_qcs I received or collected during t
  3. I obtained QC's via either collecting votes as a leader or receiving them from other replicas, but I rejected those QCs, because:

    • They didn't have enough signatures from my current validator set (possibly I have an outdated validator set in which case I need to sync)
    • The signatures were invalid, in which case someone's trying to fool me
    • The blocks in the QC's were unknown to me i.e, I don't have them stored in my block tree

    In all of these cases, I was not only unable to update my highest_qc, but also to insert a block or vote for a proposal or a nudge, meaning that all proposal and nudges during t were unacceptable for me so I made no progress.

Note: t should be big enough, in particular t >> view_timeout

Issue: In case (2) clearly no progress was made, but does it mean I need to sync? I am somehow far ahead of everyone else, so do they need to catch up with me or do I need to catch up with them? Yet my highest_qc is valid which means the majority voted for its block, but perhaps the validator set was updated in the meantime, all other replicas are new and they don't know of this qc. However, they must have received it via new_view or sync, therefore for long enough t (2) should not happen.

I think this, combined with not triggering sync upon receiving QC from the future, solves the issue from issue #12. Now even if a lagging replica enters sync with a peer that is in the process of validating a block b, and hence only receives the blocks up to b.parent, it will not trigger sync when it receives a "future" QC for b because other replicas are ahead. Instead it will update its highest QC to the QC for b (because "progress" was made), and insert b into its block tree as soon as the proposal message for b is returned from the cache and the proposal is validated.

karolinagrzeszkiewicz commented 8 months ago

There are multiple issues with the approach described above:

  1. It has liveness issues, e.g., in a minimal example with only two validators, the second one started later and hence being a few views behind the first one, both might fail to obtain a QC, and sync with each other (this was the case when running default_pacemaker_view_sync_integration_test) instead of waiting until their views are in sync (which is guaranteed by the current DefaultPacemaker's view synchronization mechanism)

  2. It is too strict and doesn't take into account the case when the quorum is only about to be formed, when most of validators are just starting they might not be updating their highest_qc because they are out of sync with each other, but they might be making progress in the sense of inserting valid blocks that might be committed later once there is a quorum in sync

  3. *(EDIT: not sure if this is an issue): Our previous sync mechanism which relied on triggering sync when a replica sees a QC from the future enforced some monotonicity between the replica that syncs and the replicas it should sync with, i.e. the latter are ahead in terms of view number. However, with failing to update highest_qc as a marker of "lagging" behind we fail to obtain that monononicity, e.g., if 1/2 replicas are failing to update highest_qc, and so does the other 1/2 replicas, we don't know which replicas should sync with which replicas.

karolinagrzeszkiewicz commented 8 months ago

This commit implements the following solution for sync_trigger_timeout:

Progress =

  1. Receiving an acceptable proposal (i.e. a proposal a replica can vote for and a block it can insert), or
  2. Receiving an acceptable nudge, or
  3. Receiving an acceptable block via sync

Indeed progress implies one of the three, since progress involves extending the blockchain with new blocks in a safe manner. At least this is the kind of "progress" we are concerned with for the sake of the sync mechanism, which involves a replica catching up with the state of the blockchain (and the highest_qc, but this is of secondary concern since now that we do not discard QC's from higher views it is easier for a replica to stay up-to-date with highest_qc's even when it is behind on view number of blocks).

Nudge needs to be considered too, since validator-set-updating blocks take 3 voting rounds to commit during which no other blocks might be proposed but progress on the validator-set-updating block is made.

Lack of progress as defined above would indicate that received proposals/nudges/blocks are not acceptable i.e,:

  1. Are not supported by valid QC's given the replica's known validator set
  2. Reference blocks that are not known to the replica
  3. For any other reason are invalid given the replica's knowledge of the state of the protocol

Note: unlike the old sync mechanism based on receiving QC from the future, this makes it possible for a replica that's ahead in terms of view number to sync with the majority of replicas making progress at lower view numbers

Note: this approach also solves issue #12

Implementation:

Whenever progress is made, the algorithm thread's SyncTriggerTimer is updated

SyncTriggerTimer.last_update is checked via the timeout() method after every iteration of the loop in execute view. If timeout() is true, we return Err(ShouldSync) which triggers the sync, but we do not update the timer since if there was actually a need to sync then the timer will be updated during sync (case 3 of definition of "progress").

The decision to call the timeout() method after every iteration of the loop in execute view is motivated by the fact that after sync we'd like the replica to get a chance to update the timer (if an acceptable proposal or nudge is received) and hence check if there's still need to sync. This is useful in cases when sync is triggered e.g., because of lack of progress after starting the protocol, but no blocks are received by the syncing replica and hence timer is not updated via sync. Then, we give the replica the chance to update the timer by returning to the progress protocol execution, rather than triggering a cycle of sync which would violate liveness.

However, it is not the best design for cases when the sync server of the sync peer is not responding or malicious and hence the replica fails to insert any new blocks via sync - in this case we'd ideally want to trigger a sync with a new peer immediately after, but realistically a replica can't distinguish between the cases of not inserting new blocks (1) because peers made no progress either, and (2) because the peer is malicious/not responsive. So all things considered, I think this is the best possible solution.

TODO:

karolinagrzeszkiewicz commented 8 months ago

hotstuff-rs 0.3 changes

New features

Configuration

Single interface for the user to set the values of parameters and group them in a single struct, without having to pass an excessive number of arguments to public functions

pub struct Configuration {
    pub me: SigningKey,
    pub chain_id: ChainID,
    pub sync_trigger_timeout: Duration,
    pub sync_request_limit: u32,
    pub sync_response_timeout: Duration,
    pub progress_msg_buffer_capacity: u64,
    pub log_events: bool,
}
  1. Parameters moved from App: chain_id
  2. Parameters moved from Pacemaker: sync_request_limit, sync_response_timeout
  3. Parameters previously passed directly to Replica's start method: me
  4. New parameters: sync_trigger_timeout, progress_message_buffer_capacity, log_events

Event handlers

This feature allows the user to register event handlers for a pre-defined set of events upon starting the replica, such that whenever an event is published, appropriate handlers are executed.

Implementation:

Pre-defined events

Events, defined as an enum in events.rs:

pub enum Event {
    // Events that change persistent state.
    InsertBlock(InsertBlockEvent),
    CommitBlock(CommitBlockEvent),
    PruneBlock(PruneBlockEvent),
    UpdateHighestQC(UpdateHighestQCEvent),
    UpdateLockedView(UpdateLockedViewEvent),
    UpdateValidatorSet(UpdateValidatorSetEvent),
    // Events that involve broadcasting/sending a Progress Message.
    Propose(ProposeEvent),
    Nudge(NudgeEvent),
    Vote(VoteEvent),
    NewView(NewViewEvent),
    // Events that involve receiving a Progress Message.
    ReceiveProposal(ReceiveProposalEvent),
    ReceiveNudge(ReceiveNudgeEvent),
    ReceiveVote(ReceiveVoteEvent),
    ReceiveNewView(ReceiveNewViewEvent),
    // Progress mode events.
    StartView(StartViewEvent),
    ViewTimeout(ViewTimeoutEvent),
    CollectQC(CollectQCEvent),
    // Sync mode events.
    StartSync(StartSyncEvent),
    EndSync(EndSyncEvent),
    ReceiveSyncRequest(ReceiveSyncRequestEvent),
    SendSyncResponse(SendSyncResponseEvent),
}

Event subscriber implemented as an optional separate thread

Justification:

Event handlers defined as pointers to closures

i.e., type HandlerPtr<T> = Box<dyn Fn(&T) + Send>, where T = one of Event struct types defined in events.rs

Justification:

It is necessary to store pointers, since Fn(&T) is a trait and its heap size is not known statically. Moreover, the Send trait is required so that the closure pointers can be passed from the thread that takes user input and starts the replica to the event_bus thread.

EventHandlers struct for storing handlers as vectors for each event type

The following methods are defined for the struct:

Justification:

The events are published after the corresponding action is completed

In particular events that modify the block tree are only published after the event takes effect, i.e, the change is written to the block tree (rather than a write batch).

Consequence:

insert_block and commit_block had to be refactored to enable publishing events after the changes in the write batch are written to the BlockTree.

Justification:

Logging

With this feature logging becomes optional and logs implemented as event handlers. The user can register pre-defined logging handlers simply by setting log_events = true in Configuration.

Implementation:

All logs are log::info

Justification:

All pre-defined events, and hence logs, provide information on the progress of the protocol, i.e., events expected from a safe and live execution of the hotstuff-rs protocol.

Logging handlers defined by implementing a Logger trait for each event type

In logging.rswe define:

pub(crate) trait Logger {
    fn get_logger() -> Box<dyn Fn(&Self) + Send>;
}

and implement the trait for each event type.

Justification:

This gives us a uniform and clean way of obtaining a logging handler for each event with <event>::get_logger(). It gives us more type safety and makes refactoring or future extensions of the event set much easier and safer.

Builder pattern for replica

Now users can start the replica with a user-friendly builder pattern, and likewise define the configuration with a builder pattern, for example in tests/test.rs:

let configuration = 
            Configuration :: builder()
            .me(keypair)
            .chain_id(0)
            .sync_request_limit(10)
            .sync_trigger_timeout(Duration::new(40, 0))
            .sync_response_timeout(Duration::new(3, 0))
            .progress_msg_buffer_capacity(10000)
            .log_events(true)
            .build();

let replica = 
            ReplicaSpec :: builder()
            .app(NumberApp::new(tx_queue.clone()))
            .pacemaker(pacemaker)
            .network(network)
            .kv_store(kv_store)
            .configuration(configuration)
            .on_insert_block(insert_block_handler)
            .build()
            .start()

Note:

Implementation:

Extra ReplicaSpec struct to hold all parameters required to start a replica with:

  1. TypedBuilder crate macros which define how to set the values of ReplicaSpec fields from the setters and their arguments, and
  2. a start method which returns Replica<K>

Why TypedBuilder?

ReplicaSpec's on_<event> fields store vectors of handler pointers

on_<event> setters set the corresponding fields to a singleton vector containing a pointer to the handler passed as argument

Justification:

Extra validate_block_for_sync method of the App trait

With this feature the user can define different method for validating a block for progress mode (validate_block) and for sync mode (in this case called validate_block_for_sync). Now both validate_block and validate_block_for_sync are methods of the App trait, with the same type signature.

Enhancements

ed25519_dalek upgrade for improved security of Public Key Cryptography and type-safety:

Context:

ed25519_dalek 2.0.0 comes with improved security of Public Key cryptography, and for us also improved type safety, as the new implementation of VerifyingKey allows us to use it in public interfaces. Previously, we used PublicKeyBytes, an alias for[u8; 32], which might lead to type errors that cannot be detected by the compiler. For example, CryptoHash is an alias for the same type.

Implementation:

Updated ed25519_dalek to version 2.0.0 (latest version)

Consequence:

This also required importing the rand_core feature for ed25519-dalek and rand_corecrate in Cargo.toml, because it is required to generate ed25519-dalek keypairs in tests/test.rs

Replaced PublicKeyBytes by ed25519-dalek::VerifyingKey (aliased as PublicKey) in all public interfaces

In particular replacing the key type in hashmaps and hashsets by PublicKey, concretely ValidatorSetUpdates and ValidatorSet.This is possible because ed25519-dalek::VerifyingKey implements the Hash trait.

Key design decision:

Defined ValidatorSetUpdatesBytes and ValidatorSetBytes, visible only within the crate, in addition to the public ValidatorSetUpdates and ValidatorSet, and define Into and TryFrom implementations to allow conversion between the structs using PublicKey and the struct using PublicKeyBytes.

Justification:

This is necessary because PublicKey does not implement BorshSerialize and BorshDeserialize, so e.g., upon reading ValidatorSet from BlockTree, we need to first deserialize it into ValidatorSetBytes, and then convert it to ValidatorSet using try_from. Likewise, when writing to the BlockTree we first call into, then serialize.

Consequence:

Improved type-safety. try_from can return ed25519-dalek::SignatureError when the PublicKeyBytes do not correspond to a valid ed25519-dalek::VerifyingKey, which is different from the std::io errors possibly returned when deserializing and ensures that replicas only store cryptographically valid PublicKeys in their memory.

Timeout-based sync trigger mechanism

Solution:

Progress =

1. Receiving an acceptable proposal (i.e. a proposal a replica can vote for and a block it can insert), or
2. Receiving an acceptable nudge, or
3. Receiving an acceptable block via sync

Indeed progress implies one of the three, since progress involves extending the blockchain with new blocks in a safe manner. At least this is the kind of "progress" we are concerned with for the sake of the sync mechanism, which involves a replica catching up with the state of the blockchain (and the highest_qc, but this is of secondary concern since now that we do not discard QC's from higher views it is easier for a replica to stay up-to-date with highest_qc's even when it is behind on view number of blocks).

Nudge needs to be considered too, since validator-set-updating blocks take 3 voting rounds to commit during which no other blocks might be proposed but progress on the validator-set-updating block is made.

Lack of progress as defined above would indicate that received proposals/nudges/blocks are not acceptable i.e,:

  1. Are not supported by valid QC's given the replica's known validator set
  2. Reference blocks that are not known to the replica
  3. For any other reason are invalid given the replica's knowledge of the state of the protocol

Assuming no Byzantine attack happened, each of the above implies that the replica's knowledge of the protocol execution is outdated and thus, the replica needs to sync.

Note:

Unlike the old sync mechanism based on receiving QC from the future, this makes it possible for a replica that's ahead in terms of view number to sync with the majority of replicas making progress at lower view numbers

Implementation:

Why call the timeout() method after every iteration of the loop in execute view?

Limitation:

This is not the best design for cases when the sync server of the sync peer is not responding or malicious and hence the replica fails to insert any new blocks via sync - in this case we'd ideally want to trigger a sync with a new peer immediately after failed sync.

However, realistically a replica can't distinguish between the cases of not inserting new blocks (1) because peers made no progress either, and (2) because the peer is malicious/not responsive. So all things considered, I think this is the best possible solution.

_Why not "progress = updating highest_qc"?_

  1. In theory: having an up-to-date highest_qc is not a precondition for for accepting proposals, but having up-to-date blocks in the BlockTree is
  2. In practice: the "progress = updating highest_qc" approach leads to triggering unnecessary sync cycles at the start of the protocol when replicas views are not synchronised and no QC's are collected or sent (this is in fact what happens in one of our test cases)

Usage:

Deciding how big sync_trigger_timeout can be is important and needs to consider the Pacemaker's view synchronization mechanism via the view_timeoutmethod. In general, it should be bigger than expected view_timeout.

For our DefaultPacemaker, view_timeout = $2^{curView - highestQCView}$ seconds so it is biggest when a replica hasn't updated its highest_qc for a while. Say for 5 views, then view_timeout= $2^{5} = 32$ for the 5th view after highest_qc update.

Message buffer capacity

Context:

Previously, out message buffer could overflow, as described in issue #1

Solution:

Make sure that the buffer doesn't grow beyond the user-defined message_buffer_capacity by checking if adding a given message to the buffer would result in overload, and

Implementation:

progress_msg_buffer_capacity: now this config parameter is stored as a field of ProgressMessageStub (as msg_buffer_capacity), along with msg_buffer_size which keeps track of the number of bytes of the values stored in the msg_buffer

Two extra fields for ProgressMessageStub (in addition to msg_buffer):

  1. msg_buffer_capacity: user-defined progress_message_buffer_capacity
  2. msg_buffer_size: keeps track of the number of bytes of the values (but not keys) stored in the msg_buffer, i.e., for every (sender, msg) added to the buffer the msg_buffer_size is incremented by mem::size_of::<PublicKey>() as u64 + Self::size_of_msg(&msg)

Two extra methods for ProgressMessageStub:

  1. remove_from_overloaded_buffer: given a buffer and number of bytes to remove, it removes highest-viewed messages from the buffer until at least the required number of bytes is removed
  2. size_of_progress_msg: computes the number of bytes required to store a particular ProgressMessage, by pattern-matching on the enum and calling mem::size_of

Now ProgressMessageStub's recv method, upon receiving a message of higher view than current view, also checks if messages need to be removed from the buffer to make space for the new message and performs the removal if needed.

Non-stack-overflowing commit_block

Reason:

Previous version is non-tail-recursive and hence vulnerable to stack overflow, see issue #2

Implementation:

Implemented commit_block with iterators and closures, concretely by defining an iterator over blocks and folding over it, which is logically equivalent to tail recursion (but not computationally, because of Rust compiler)

Justification:

In Rust iterators are "the way". They are:

Limitation:

Because we need to commit blocks from oldest to newest (to apply app and validator set updates in correct order), and because we can only obtain the uncommitted blocks from newest to oldest (by obtaining a block's parent starting from the inserted block), we need to reverse the latter.

This requires storing a vector of uncommitted_blocks (since single-ended iterators cannot be reversed), but since it is stored on the heap and goes out of scope after the function returns it is fine. Then we use an iterator over uncommitted_blocks, which only stores pointers to the heap memory on stack, so it should avoid stack overflow.

Checking chain_id

Context:

Upon receiving a QC, replicas do not check if it matches their chain_id

Solution:

Now safe_qc takes replica's chain_id as an extra parameter and checks whether qc.chain_id == chain_id

Bug fixes

Updating locked_view after receiving a nudge with prepareQC (issue #9 )

Context:

Whenever we receive a prepareQC for a validator-set-updating block b, b.justify should be locked, but previously it wasn't.

The reasons for updating the lockedQC (which we think is equivalent to our locked_view) are described in the HotStuff paper, but in short locks guard safety of the commit and guarantee liveness. The invariant is that if a value is locked it will be committed eventually, and also honest replicas know of a QC for the value.

Solution and Implementation:

Whenever we receive a prepareQC for a validator-set-updating block b, set locked_view = b.justify.view if b.justify.view > locked_view. This applies to (1) receiving a nudge with prepare phase QC, (2) receiving new view with a highest_qc from the prepare phase, and (3) collecting a QC for the prepare phase.

pending_validator_set_updates (issue #15)

Context:

Previously pending_validator_set_updates called on a block's hash did not return None in case the deserializing failed, instead it panicked, and in case of success returned Some(...).

Solution:

Because of how is_some() and is_none() are called on pending_validator_set_updates(&qc.block) in safe_qc, the expected behaviour is that pending_validator_set_updates returns None when there are no updates and the deserializng fails.

Hence, if deserializing fails we return None.

Other

Updated tests in tests/test.rs to account for the updates described above

Did not implement block_exists since the BlockTree already has a contains(&self, block: &CryptoHash) -> bool method

lyulka commented 8 months ago

Summary of review meeting (Oct 19)

I raised one protocol-significant comment on the pull request, as well as a few style comments.

Protocol-significant comment

The definition of “progress” used in this pull request is “receiving a proposal, nudge, or sync response that causes a block to be inserted into the block tree, or a QC to be set as the highest QC”.

I suggested that this definition could be made stricter to only admit blocks that have a block height higher than the previously highest block.

Style comments

karolinagrzeszkiewicz commented 8 months ago

Thoughts on the sync trigger mechanism (after discussion with @lyulka)

Issue with the current definition of "progress" (proposed above): two scenarios when replicas should sync but they do not

Scenario 1:

Scenario 2:

"Issue with the issue": two scenarios that are indistinguishable from the scenarios above from the perspective of a syncing replica, where ideally replicas should not sync

Scenario 3 (indistinguishable from scenario 1):

Scenario 4 (indistinguishable from scenario 2):

Proposed solution

That solves the issue with scenarios 1 and 2, and maybe we can make it work for 3 and 4?

Progress = 

1. Receiving an acceptable proposal for a block **that extends the blockchain**
2. Receiving an acceptable nudge **with a new QC**
3. Receiving an acceptable block **that extends the blockchain via sync**

where:

Issues with the proposed solution

Implies that in scenarios 3 and 4 replicas will also try to sync, in particular at the start of the protocol when replicas are generally out of sync

To avoid sync in scenarios 3 and 4 altogether, we need to answer the question: What is the difference between scenarios 1 and 3 (or 2 and 4) from the perspective of the "minority replica"?

karolinagrzeszkiewicz commented 8 months ago

More thoughts on sync trigger mechanism

Issue (with all approaches described above)

As described above, we check if there is sync trigger timeout after every iteration of the loop in execute_view. However, the implication of this is that in the scenario when replicas' views are out of sync, and eventually they synchronise and vote on a proposal (on_receive_proposal), the leader will not get a chance to collect these votes (via on_receive_vote), but rather they will go into sync mode, and hence never collect the votes.

Receiving the above-mentioned proposal will not update the timer, since the proposed block does not extend the blockchain, being one of multiple children of the parent block (in the beginning of the protocol, the genesis block), but the first one which can obtain a QC because the validators are finally in sync. This is why sync is triggered again.

Solution

Check if there is sync trigger timeout upon every view timeout.

Unfortunately this means checking if there is sync trigger timeout quite rarely, but it seems to be necessary if we want to go with this definition of "progress" without compromising liveness?

Alternatively, we can treat the children of the genesis block differently (i.e. consider any valid proposal that extends genesis as progress), although in theory this scenario can happen for other blocks too.

karolinagrzeszkiewicz commented 8 months ago

Sync trigger mechanism - solution

Implementation

Define progress as:

Progress = 

1. Receiving an acceptable proposal for a block **that extends the blockchain**
2. Receiving an acceptable nudge **with a new QC**
3. Receiving an acceptable block **that extends the blockchain via sync**

where a block block extends the blockchain if and only if:

block_tree.children(&block.justify.block).is_none() || 
(block_tree.children(&block.justify.block).is_some() && block_tree.children(&block.justify.block).unwrap().is_empty())

and a QC qc is new if and only if a replica's highest_qc becomes updated to qc in response to receiving the nudge.

On each of these events we update the sync trigger timer.

We checker the value of the timer, and trigger sync if the value exceeds sync_trigger_timeout, upon every view timeout.

Justification

Definition of progress:

Checking the timer on every view timeout:

As described here if we check the timer (and potentially trigger sync) after every iteration of the loop in execute_view, in other words after processing each progress message then:

Moreover, since sync_trigger_timeout should be bigger than expected view_timeout (as described here) it's okay for the timer to be checked as rarely as after every view timeout

Conceptually, this solution means that we are giving a given view a chance to bring us progress, and if that doesn't happen for at least 1 view (or more views, depending on sync_trigger_timeout and for how many views a replica hasn't updated its highest_qc) then we trigger timeout (perhaps a number-of-views-based, rather than timeout-based sync trigger mechanism could work too)

Conclusion

Overall, I think this timeout-based sync trigger mechanism is a bit slower than the previous event-based one, based on receiving QC from the future, but I believe that, unlike the previous sync trigger mechanism, it should not lead to violations of liveness.

karolinagrzeszkiewicz commented 8 months ago

Those are very valid concerns, especially the concern that sync_trigger_timeout is meaningless as a parameter if we only trigger sync after view_timeout which might be much bigger. I agree that it is better to move this change to hotstuff-rs 0.4 - in hope that Cogsworth might resolve the problems with timeout-based sync trigger, but also to earn more time to make sure that our new sync trigger mechanism doesn't violate liveness.

One more concern I have about the solution above is that it makes liveness dependent on the config parameters set by the user. However, in general we would like that the protocol satisfies liveness regardless of the config parameters set by the user - the values of these parameters can affect the speed at which the protocol execution moves forward, but eventual progress should be ensured regardless of the values of the parameters. And if we can't achieve that, then we should enforce the desired relations between the config parameters through the code.

Also, in case we really want to resolve issue #12 in hotstuff-rs 0.3, one very quick (but not very neat) solution might be to change the definition of "future", as far as receiving a QC from the future is concerned, from highest_qc.view > cur_view to highest_qc.view > cur_view + 1 here. I haven't really thought it through though.

lyulka commented 8 months ago

FYI: chain_id of Genesis QC always being 0 regardless of what’s provided by the user is intended behavior.

karolinagrzeszkiewicz commented 8 months ago

Ok, got it, I thought it's unintended. I guess the reason for that is to avoid passing chain_id as an argument to genesis_qc() or Replica's initialize method?

The reason I did this is that the fullnode tests were failing because with hotstuff-rs 0.3 in safe_qc we are checking of the replica's chain_id and the chain_id of the qc match - with the genesis chain_id hardcoded as 0 they don't, and all proposals for the first block fail. However, the alternative solution to this problem is not to check the chain_id of the genesis_qc in safe_qc, so I can do this instead.

karolinagrzeszkiewicz commented 8 months ago

Conclusive remarks about non-tail-recursive commit_block and delete_branch

Context

commit_block and delete_branch are non-tail-recursive and hence, potentially prone to stack overflow, as described in issue #2 and issue #23

Decision

Re-write commit_block and delete_branch in iterative style.

Justification

Clearly, recursive commit_block and delete_branch are cleaner and easier to reason about. Moreover, by claim (1) below commit_block can only commit a maximum of 3 blocks at a time, and by claim (2) delete_branch can only by called on branches of a maximum depth of 3, giving us reasons to believe that stack overflow is unlikely.

However, the maximum breadth of a branch extending from a sibling of the pruned block is unbounded. This is because a replica is free to insert arbitrarily many children for a block supported by a QC as long as they are distinct and this is enabled by the user's App::produce_block implementation. When the replicas are honest and their views are in sync this should not happen. Nevertheless, if a replica or a group of replicas lack a quorum present in the same view required to make progress, they will keep inserting children (with the last block supported by a QC as the parent) until the quorum is formed. I think we should account for this scenario - even though it's our responsibility to design the protocol to avoid such long periods of stagnation, I think we should hope for the best and design for the worst (protocol performance).

This affects both commit_block and delete_branch, because commit_block calls delete_siblings and delete_siblings calls delete_branch. Even if we keep commit_block non-tail-recursive, and rewrite delete_branch in iterative style, a stack overflow from commit_block might still happen because of the accumulation of stack frames due to the number of operations required to delete a branch that has a lot of children.

Even though I prefer clean non-tail-recursive code, I think performance should come first, even if the risk is small...

For future reference

Claim 1: By the protocol specification commit_block can commit a maximum of 3 blocks

Proof:

By induction on the block height of the inserted block (since commit_block is only called from insert_block).

Base case: If we are inserting a block at height $\leq 4$ this is trivial.

Inductive step: Suppose we are inserting a block, call it $b_5$ at some height $h$. WLOG assume it's parent is $b4$ etc. so that we get a subchain $b{5} \rightarrow b{4} \rightarrow b{3} \rightarrow b{2} \rightarrow b{1}$. Then:

Case 1: $b_{5}.justify$ is in generic phase

Then commit_block is called on $b_{2}$, and:

Case 2: $b_{5}.justify$ is in commit phase

Then commit_block is called on $b_{4}$, and:

Q.E.D

Claim 2: By the protocol specification any "speculative" branch can have a depth of maximum 3.

speculative branch = branch which has a sibling of a pruned block as its root

Proof:

For the sake of contradiction, assume that $b{4} \rightarrow b{3} \rightarrow b{2} \rightarrow b{1}$ is a subchain of a branch of depth 4 with $b_{1}$, a sibling of a committed and pruned block $b$ (i.e., about to be pruned), as its root.

Then on inserting $b_{4}$ either:

Q.E.D

karolinagrzeszkiewicz commented 8 months ago

The last commit reverts the fix for pending_validator_set_updates. As explained here pending_validator_set_updates has been falsely labelled as exhibiting incorrect behavior. In short, the old version correctly returns None when getting the value for the respective key from the key-value store fails, and throws an error when deserializing the obtained value fails.

lyulka commented 8 months ago

This affects both commit_block and delete_branch, because commit_block calls delete_siblings and delete_siblings calls delete_branch. Even if we keep commit_block non-tail-recursive, and rewrite delete_branch in iterative style, a stack overflow from commit_block might still happen because of the accumulation of stack frames due to the number of operations required to delete a branch that has a lot of children.

Quick comment from me: delete_siblings calls delete_branch inside a for loop, and a single iteration of the for loop completes only after the delete_branch call completes. Similarly, delete_branch also calls itself from within a for loop. In other words, we will completely delete a single branch (which has a max depth of 2) before “moving on” to deleting another branch, so the stack frames will not accumulate, no?

lyulka commented 8 months ago

Don’t forget documentation.

karolinagrzeszkiewicz commented 7 months ago

This affects both commit_block and delete_branch, because commit_block calls delete_siblings and delete_siblings calls delete_branch. Even if we keep commit_block non-tail-recursive, and rewrite delete_branch in iterative style, a stack overflow from commit_block might still happen because of the accumulation of stack frames due to the number of operations required to delete a branch that has a lot of children.

Quick comment from me: delete_siblings calls delete_branch inside a for loop, and a single iteration of the for loop completes only after the delete_branch call completes. Similarly, delete_branch also calls itself from within a for loop. In other words, we will completely delete a single branch (which has a max depth of 2) before “moving on” to deleting another branch, so the stack frames will not accumulate, no?

From my understanding this would have been the case if delete_siblings was not called by a recursive function, by the reasons you have mentioned. However, if delete_siblings is called on the result of a recursive call to commit_block in commit_block, then all computation performed by delete_siblings, even if it's a for loop, is accumulated on the stack. Basically, the threat of stack overflow in commit_block might arise not because of the depth of recursion (which is at most 3), but because of the number of stack allocations on each recursion level before we get to the "base case" and stack frames start getting popped. If we call commit_block on $b{3}$ where the uncommitted subchain is $b{3} \rightarrow b{2} \rightarrow b{1}$, then until commit_block is called on $b_{1}$ all computation associated with calling commit_block on $b{1}$ and $b{2}$ will be pushed to the stack.

So I guess if our reasoning here is correct there are two solutions:

  1. Make both iterative.
  2. Make only commit_block iterative.*

*This is fine assuming that a branch extending from a sibling of a pruned block has a limited breadth, otherwise by the same logic as above the stack frames of the for loop will accumulate because the for loop takes a result of the recursive call. I think it's very unlikely that the breadth at this level is very big, but possible, I'd need to think about this.