hotstuff-rs 0.4 new module organisation and types

This PR proposes a new code organisation that is expected to enable a smooth integration of the new components we intend to introduce with hotstuff-rs 0.4. In particular, this enables a Pacemaker protocol, with associated networking functionality, to replace what used to be a Pacemaker trait with two associated methods.

The scope of this PR is as follows:

New, more granular, modular structure,
New HotStuff, Pacemaker, BlockSyncClient, BlockSyncServer types,
New Message type hierarchy,
New generics to generalize some functionalities shared across the different protocols, e.g., Collector or SignedMessage traits,
New Algorithm struct which provides handles to its sub-protocols, and its execute method, which defines how the sub-protocols interact.
A fix for #30.

Note:

The block_sync::client and block_sync::servers type and method signature is very tentative, it will likely change after we finalize the implementation details.
This does not include the changes to the type signature that may be required to adapt dynamic validator sets (consensus and speculation). This is to enable me to test the new Pacemaker against the current HotStuff implementation before I update the implementation.
This does not include updates to state (for this PR the module is exactly the same as in 0.3). I plan to make relevant updates incrementally, i.e., the Pacemaker-related updates when implementing the Pacemaker, and then Dynamic-Validator-Sets-related updates when implementing them.
Documentation polishing and events updates are also left for later stages of implementation.

Some changes that I have in mind as of this moment:

Rename Algorithm to ConsensusEngine, ConsensusMaker, or ProtocolExecutor,
Use one of the smart pointers for storing a pointer to block_tree as a field of HotStuff, Pacemaker, and SyncClient, instead of passing a mutable reference to the block tree every time we call their methods. EDIT: thread-unsafe pointers not compatible within the current code structure, because the pointer is passed to the spawned algorithm thread so it must implement Send. Unless we refactor to create Pacemaker, HotStuff, and SyncClient after the algorithm thread is spawned in the start method. Thread-safe pointers have an associated overhead, so not worth it here.

I skimmed through the PR a bit on my phone but will have to continue tomorrow. Two points I want to look at (will definitely look at other things, too):

There are some interesting traits in the PR like Collector, SignedMessage, and Certificate. Each of these link together a couple/few different types in HotStuff-rs based on their common functionalities. This seems nice, but I wonder concretely what sort of generality/succinctness these traits enable in the code.
I wonder whether we can make the creation of Nudges a bit more type-safe by limiting the kinds of QCs that a Nudge can contain at a type level, instead of panicking in the constructor like we currently do (behavior which this PR doesn’t change).

I skimmed through the PR a bit on my phone but will have to continue tomorrow. Two points I want to look at (will definitely look at other things, too):

* There are some interesting traits in the PR like Collector, SignedMessage, and Certificate. Each of these link together a couple/few different types in HotStuff-rs based on their common functionalities. This seems nice, but I wonder concretely what sort of generality/succinctness these traits enable in the code.

* I wonder whether we can make the creation of Nudges a bit more type-safe by limiting the kinds of QCs that a Nudge can contain at a type level, instead of panicking in the constructor like we currently do (behavior which this PR doesn’t change).

Traits

The traits introduce more generality and extensibility, but they don't enable succinctness as much as I'd want to. It mostly serves to generalize types, e.g, that QC and TC are really instances of the same thing. More generally they point to the fact that HotStuff and Pacemaker operate based on the some shared principles of consensus, such as having certificates and signed messages serve as evidence. They also enforce a shared signature among instances. It comes with some provided methods too, e.g., SignedMessage::is_correct and Certificate::quorum.

Ideally I'd like the simple, small methods to be required, e.g., SignedMessage::signature_bytes, and all "big" methods to be provided given the required methods. Concretely, I'd like Collector::collect and Certificate::is_correct to be provided. The reason why I haven't done it is that it requires adding more required methods like Certificate::value_bytes and Certificate::is_genesis (always false for a TC), but now that I think about it adding those methods shouldn't big a big deal. What do you think?

Type safety and nudges

Yes, I've had this in mind for the past few weeks, but I didn't include it since I thought we postponed the decision about type wrapping for later and I am also not sure about the best way to go about this... Ideally I would enforce everything at the type-level, especially since a lot of bugs I have seen so far in hotstuff-rs were due to type aliasing or confusion regarding the meaning of a type.

What I had in mind was to have structs NudgeJustify {qc: QuorumCertificate} and BlockJustify {qc: QuorumCertificate} (possibly also AdvanceViewJustify for consistency), and implement TryFrom/ TryInto. However, wrapping may not be enough, since this way a NudgeJustifycan still have a qc with a Generic phase, but it will fail QuorumCertificate::try_into (and it seems wrong to return Errhere, since it clearly contains a QC).

Perhaps more intuitive solution would be to have a QuorumCertificate enum or trait, which branches out into a NudgeJustify and BlockJustify, which are like the current QuorumCertificate struct but the former has NudgeJustifyPhase and the latter has BlockJustifyPhase.

Or more elegantly, QuorumCertificate<P: PhaseType> where the BlockJustifyPhase and NudgeJustifyPhase enums implement PhaseType (or just Quorumcertificate<P> since the trait doesn't really do anything useful other than enforcing which types can be substituted for P). Then nudges must have QuorumCertificate<NudgeJustifyPhase> and blocks QuorumCertificate<BlockJustifyPhase>.

Thoughts?

Review (11-12 March, 2024)

About the `SignedMessage`, `Certificate`, and `Collector` traits

Between SignedMessage, Certificate, and Collector, SignedMessage is the only trait so far that I'm convinced has a reason to exist. This is because at least it contains a blanket implementation of is_correct, so its implementors (Vote and TimeoutVote) do not need to implement it, improving succinctness. As for the other two traits:

Collector: there are no blanket implementations implemented on Collector, and no type or function is generic over Collector. So the only benefit to Collector that I can see so far is that it encourages a consistent interface on the "Collector" types. However, this is only an encouragement, and not an enforcement. Nothing forces us to implement Collector for any new "Collector" types we may wish to add in the future, and even currently, the NewViewCollector type does not implement Collector, which is confusing.
Certificate: again, there are no blanket implementations on Collector, and no type or function is generic over Certificate, so this makes me question whether this abstraction really provides any succinctness/generality benefits. However, I think there are no negatives to this trait remaining in the library, because at least there isn't a "Certificate" type that does not implement the Certificate trait, like there is a "Collector" type (NewViewCollector) that does not implement the Collector trait.

Overall then, I have no opposition towards SignedMessage and Certificate remaining in this PR, but I don't feel like there is enough of a reason for trait Collector to remain.

Creating a blanket implementation for `Collector::collect`

You mentioned in your reply that you'd like to Collector::collect to have a blanket implementation. I think this is a good idea, but I'm not sure if trait Collector is really the best way to create such a blanket implementation.

For me, a more intuitive way to do this would be to create a generic Collector struct that can work to collect both Vote and TimeoutVote. Its type signature could look like: struct VoteCollector<V: Vote>, where Vote is a trait that is implemented by both PhaseVote (renamed version of struct Vote) and TimeoutVote, and which has an associated type Certificate: Certificate, so overall, we'll have something like:

trait Vote {
    type Certificate: Certificate;

    // Omitted: methods common to both HotStuffVote and TimeoutVote.
}

struct VoteCollector<V: Vote>;

impl<V: Vote> VoteCollector<V> {
    fn collect(signer: &VerifyingKey, vote: V) -> V::Certificate;

    // Omitted: constructor method for VoteCollector.
}

@karolinagrzeszkiewicz tell me what you think.

Type-safety of Nudge

It's good that we are thinking of the same thing vis-a-vis Nudges and Quorum Certificates. I do strongly support us doing some kind of re-organization of the Quorum Certificate type to improve typesafety, but I'm not exactly sure what the best way would be exactly. To me, such a re-organization shouldn't need to involve generic type parameters (enums should be enough).

I think your suggestion to introduce NudgeJustify and BlockJustify types is most similar to the vague idea that I have in mind. The justify field in Nudge messages will have type NudgeJustify, and the justify field in Block types will have type BlockJustify. However, as you said, if NudgeJustify and BlockJustify were to be just wrappers around a more general QuorumCertificate type, there is still type-unsafety. Further, we could make QuorumCertificate an enum with variants like PrepareQuorumCertificate and PrecommitQuorumCertificate, but again these variants will contain a field with the enum Phase, so again, still type-unsafe.

In my mind, a solution could be to remove a the Phase enum entirely and encode the phase that a QuorumCertificate contains entirely at the type level (so, a PrecommitQuorumCertificate definitely represents a quorum decision in the Precommit phase). This also entails doing the same with PhaseVote; so PhaseVote will be an enum with variants PrepareVote, PrecommitVote, etc. On this note, maybe we could rename QuorumCertificate to PhaseCertificate so there is symmetry between:

PhaseVote (HotStuff) and TimeoutVote (Lewis-Pye), on the one hand.
and PhaseCertificate (HotStuff) and TimeoutCertificate (Lewis-Pye), on the other.

@karolinagrzeszkiewicz let's discuss this in a meeting later this week.

Where should types be defined?

I think we should try to establish a clear policy for what types should be defined in the top-level types module (crate::types), and what should be defined in protocol-specific types modules (e.g., crate::hotstuff::types).

I think it is quite odd that, for example, Vote is defined in hotstuff::types, while the aggregate of votes, i.e., QuorumCertificate, is defined in crate::types.

Let's also discuss this in the meeting later this week.

Let me respond here, just to keep a record, but we should also discuss this during a meeting.

About the `SignedMessage`, `Certificate` and `Collector` traits

Point taken, I don't think the Collector is useful in its current form. Indeed, NewViewCollector does not fit the definition because the NewView messages are not signed.

Blanket implementation for `Collector::collect`

As for introducing blanket implementations for Collector or introducing a Collector struct generic over Vote, I am realizing that it actually involves adding significant complexity, i.e., more required methods. The question is whether it is worth it. It might make more sense to remove the trait as you originally suggested, and have a VoteCollector/BlockVoteCollector, TimeoutVoteCollector and NewViewCollector.

The VoteCollectorstruct must have some "signature storage" as a field. However, this varies between the BlockVoteCollector and the TimeoutVoteCollector. The former stores possibly multiple signature sets for different blocks and phases in a HashMap<(CryptoHash, Phase), (SignatureSet, TotalPower)>, and the latter stores only one signature set in (SignatureSet, TotalPower). Perhaps SignatureCache can be an associated type of Vote, then we could have a different type depending on the collector, but it seems that this shouldn't really belong to the specification of a vote... There are also a few more differences between the implementation of collect for BlockVoteCollector and TimeoutVoteCollector. So I imagine the signature would have to be something like this:

trait Vote: SignedMessage {
    type Certificate: Certificate;
    type SignatureCache: SignatureCache;

    // Omitted: methods common to both HotStuffVote and TimeoutVote.
    fn insert_to_signature_cache(self,  &mut signature_cache: SignatureCache) -> (SignatureSet, TotalPower);
}

trait Certificate {

   // other methods, provided earlier in my implementation

    fn contains_certificate(signature_set: (SignatureSet, TotalPower)) -> Option<Self> {
            // 1. Check if there is a quorum
            // 2. If yes construct a Certificate
    };

struct VoteCollector<V: Vote> {
     view: ViewNumber,
     chain_id: ChainID,
     signature_cache: V: SignatureCache,
}

impl<V: Vote> VoteCollector<V> {

    fn collect(signer: &VerifyingKey, vote: V) -> V::Certificate {
    // 1. Check chain id and view

    // 2. Insert the signature into the signatures cache with insert_to_signature_cache, which should return the updated signature set

    // 3. Check if a certificate was collected with contains_certificate on the signature set returned in step 2, and if yes return it

    }

    // Omitted: constructor method for VoteCollector.
}

Type safety of Nudge

Agreed, I would rather avoid generics, but for the reason that this would require having multiple methods for each trait implementation, because effectively we get a different type for each trait implementation, e.g., one collect that returns QuorumCertificate<NudgeJustifyPhase> and one that returns QuorumCertificate<BlockJustifyPhase>, hence we'd need two BlockVoteCollectors per view.

Regarding your idea, I am not sure if I understand it correctly, but here are some points I'd make:

I would rather avoid renaming QuorumCertificate, because this is the name used in the HotStuff paper, and renaming it might confuse the library user. At least for as long as this library is called hotstuff-rs.
Do you mean:
```
enum QuorumCertificate {
 PrepareQuorumCertificate,
 PrecommitQuorumCertificate,
 CommitQuorumCertificate,
 DecideQuorumcertificate,
 GenericQuorumCertificate,
}
```
The enum is definitely required, again because we want to have a single collector for QCs per view. But agreed, the phase does not need to be a field of the structs. However, this doesn't really change anything signifcant on the type level. And what we'd probably want to have type safe nudges and proposals is to have:
```
enum QuorumCertificate {
 NudgeJustify(NudgeJustify),
 BlockJustify(BlockJustify),
};
```

enum BlockJustify { DecideQuorumcertificate, GenericQuorumCertificate, }

struct BlockJustify { chain_id:, view:, signatures: \ etc. }


but this is very verbose and too nested...
3. In general, I am worried that any solution other than the one both of us had in mind with wrapping around QuorumCertificate, introduces either too much nesting, or code duplication (or even quintiplication), and overall makes the seemingly simple QuorumCertificate struct very complex. Besides, intuitively BlockJustify or NudgeJustify exists at a higher level than a QC, i.e., a QC serves a "justify", not the other way around. Perhaps having a simple `is_nudge_justify()` vs. `is_block_justify()` would be more clean, though not type-safe.

## On `Vote` vs `SignedMessage`

`AdvertiseBlock` is a `SignedMessage`, but not a `Vote`. Hence the distinction.

@lyulka I implemented the newtype pattern for types in types/basic. The primary motivation behind this implementation is that I would want to have a fixed API that restricts what can be done to values of these types (as much as possible, only to the functionalities we actually need), and avoids the downside of having to access these values through a more verbose syntax. Hence, this API is defined primarily through methods, not traits. I also followed the suggestion from this post regarding implementing traits such as Deref for newtype pattern.

The API might be slightly extended as I continue implementing hotstuff-rs 0.4, but I don't expect significant changes.

Let me know what you think about the API? And also about the visibility of these methods?

There should be functions to access the inner values of every newtype in `types::basic`

@karolinagrzeszkiewicz a major problem I think with the PR is that for a lot of the newtypes defined in types::basic there isn't a function/functions available to access the inner value of an instance of the newtype.

For example, while I could use the bytes function to get the inner [u8; 32] of a CryptoHash, there doesn't seem to be an analogous function that I could use to get the inner u64 of a ChainID.

Not being able to access the inner value of a newtype may be acceptable internally within the library, but library users should definitely be able to access the inner value of newtypes. Especially when a lot of these newtypes represent fundamental concepts like Data and BlockHeight that users can easily get, for example, from a BlockTreeSnapshot.

Therefore, I think there should be functions to access the inner values of every newtype in types::basic.

Otherwise

Otherwise, I think the PR looks good. I would like to see more documentation in some places (e.g., maybe we should explain why we are re-exporting certain types?), but I recall you saying that documentation will be the focus of a later milestone. I also think the rationale for not implementing Deref on the newtypes is sound.

@lyulka point taken, I added methods for accessing the inner value and introduced consistent naming for them i.e., get_<type>.

I also removed the unnecessary imports in types. As for documentation, I intend to expand the documentation for a particular component as I implement it, so e.g., Pacemaker documentation will come with the next milestone, but for this milestone the documentation for types should be exhaustive. So let me know if you have any other feedback regarding documentation, otherwise I can also polish it as a final milestone.

I also added two methods for QuorumCertificate: is_nudge_justify and is_block_justify.

Let me know if this is ready to merge?

@karolinagrzeszkiewicz okay, looks good to me. I'll merge now so that you can start implementing Milestone 2, but I just want to note that the Rust API Guidelines state that "With a few exceptions, the get_ prefix is not used for getters in Rust code."

I don't think we should take this prescription by the API guidelines as a hard rule, and I can see there being benefits (consistency, auto-code completion) to using the get_ prefix in the case of this PR, but you may want to think about this some more and maybe make a decision about the naming of the accessors in the PR for the next milestone.

@lyulka I didn't know about this... I had a hard time deciding what naming convention to follow, it seemed strange to access the inner type with a method name after a type like .vec() or .int(), though .bytes() make sense. get_<type> introduced some uniformity, also making the names informative, unlike .get() (which captures the fact that we are just getting the inner type, not doing any computation). I'd be reluctant to go for as_or into_ since there is no conversion happening (though this is subtle, in some sense it is a conversion). I'll think about it, what do you think would work best?

parallelchain-io / hotstuff_rs