spacemeshos / pm

Project management. Meta-tasks related to research, dev, and specs for the Spacemesh protocol and infrastructure.
http://spacemesh.io/
Creative Commons Zero v1.0 Universal
2 stars 0 forks source link

Unified Content Blocks #100

Closed noamnelke closed 2 years ago

noamnelke commented 2 years ago

SMIP: https://github.com/spacemeshos/SMIPS/issues/71

Overview

Existing blocks will be replaced by three types of objects: ballots, proposals and blocks.

When a smesher is eligible to vote, they publish a Proposal of transactions to be included in the layer. The Proposal includes a Ballot, where the smesher votes for Blocks from previous layers. A single Block is constructed for each layer from transactions in the set of Proposals which the Hare agreed are valid.

Data Structures

Ballot

The message that includes the smesher's vote for blocks in previous layers.

type Ballot struct {
    InnerBallot
    Signature []byte

    ballotID   BallotID
    smesherID  *signing.PublicKey
}

type InnerBallot struct {
    ATXID            ATXID
    EligibilityProof VotingEligibilityProof
    layerID          LayerID // private cache, derived from proof

    BaseBallot BallotID

    AgainstDiff []BlockID
    ForDiff     []BlockID
    NeutralDiff []BlockID

    RefBallot      *BallotID
    ActiveSet      []ATXID
    TortoiseBeacon []byte
}

type VotingEligibilityProof struct {
    J   uint32
    Sig []byte
}

Proposal

The list of transactions nominated by the smesher for inclusion in the layer's Block. The Ballot is embedded in the Proposal, since they are always published together and the Ballot includes the VotingEligibilityProof required to assert the Proposal's validity.

type Proposal struct {
    InnerProposal
    Signature []byte
    id        types.ProposalID
}

type InnerProposal struct {
    Ballot
    TxIDs []TransactionID
}

type ProposalID Hash20

Block

The object containing the ordered list of rewards and transactions to be applied in a layer.

type Block struct {
    LayerID LayerID
    Rewards []Reward
    TxIDs   []TransactionID

    blockID BlockID
}

type Reward struct {
    Address Address
    Amount  Amount
}

Methods and Interfaces

All data structures should implement appropriate constructors and accessors, as well as the LoggableField interface.

Proposal and Ballot Publication

We'll keep most of the logic and only change the data structures for now.

Proposal

When a smesher is eligible to vote in a layer they publish a Proposal which has an embedded Ballot. For now, we can keep the existing mechanism for picking transactions from the mempool.

Ballot

The Ballot includes the smesher's vote for Blocks in previous layers. It's based on the existing mechanism - a base block from the previous layer is selected and a diff is added to the vote, mostly for the block in the new layer. As previously, the block an honest smesher votes for in the current layer is determined by the Hare. The difference is that we don't directly vote for the Hare results, but need to do a post-processing stage to compute and persist a unified block, based on the Proposals the Hare agreed on.

Proposal Processing and Block Construction

Unlike current blocks, Proposals are ephemeral. They must be temporarily stored for hdist layers. Once the Hare converges the node should construct and persist a Block and after the hdist window closes - discard the Proposals.

At this point we should use the existing mechanism for selecting and ordering transactions when executing a layer to construct the list of transactions in the Block. This logic is subject to change, though, so if we can isolate it, so it's later easily replaceable---that's a plus.

Ballot Processing

Ballots are the input to the Tortoise. They are processed in the Tortoise like today's blocks.

Implementation Breakdown

As always, we'll start with a series of small (mostly) refactor pull requests that should be worked on for a short time and merged quickly, to minimize conflicts with work in other areas. The goal is to prepare the ground for a later longer pull request on code that has already been isolated, again, minimizing conflicts.

countvonzero commented 2 years ago

Q.1: *BallotID vs BallotID

InnerBallot{
    RefBallot      *BallotID
}

i know this definition is inherited but do you know why RefBallot is a *BallotID but others fields use BallotID? if we want to save memory efficient, should we all use *BallotID?

Q.1.A: BallotID

is BallotID also Hash20?

Q.1.B: BallotID/ProposalID encoding

do you think there is a chance we can encode layerID in BallotID and ProposalID? the tortoise code had to maintain a two-way mapping btwn layerID and blockID. the motivation here is if we can somehow know the layerID (or just part of the layerID) then we can somehow sort them and select base block/prune them efficiently.

Q.2: VotingEligibilityProof

why is it voting eligibility and not proposal eligibility? what's the rationale and consideration behind this.

Q.2.A: multiple eligibility

if a miner is eligible for multiple ballots/proposals in the same layer, can we take this chance to combine the data?

Q.2.B: content expecations

is it an error for miner to produce proposals but not ballots? is it an error for miner to produce ballots but not proposals?

Q.3: proposal blocks vs hdist

Unlike current blocks, Proposals are ephemeral. They must be temporarily stored for hdist layers. Once the Hare converges the node should construct and persist a Block and after the hdist window closes - discard the Proposals.

currently tortoise votes according to hare within hdist. after hdist, it votes according to global options (ballots). under this scheme, i don't know why we even need to keep these proposals for hdist layers. once the hare output the unified content block, that block is what tortoise will cast vote on until after hdist.

Q.4: tortoise vote counting

Tal mentioned that there is no longer the notion of contextual validity with unified content block. why? what is the difference between verifying tortoise voting and self-healing voting with unified content blocks?

Q.5: pruning

back then when we discussed this, you mentioned

We’ll need to make the content suggestions prune-able with a separate signature.

is that achieved by separate signatures on InnerBallot and InnerProposal and we can prune InnerProposal afterwards?

countvonzero commented 2 years ago

@noamnelke

noamnelke commented 2 years ago

Q.1: *BallotID vs BallotID

[...] i know this definition is inherited but do you know why RefBallot is a *BallotID but others fields use BallotID? if we want to save memory efficient, should we all use *BallotID?

The reason it's a pointer is that it's optional. The alternative is to have some sentinel value (e.g. EmptyBallotID) that we check for, but I feel that nil is more expressive and easier to understand. Having some places be pointers and others not also signals to developers where a ballot is required vs. optional.

Having said that, I leave it up to you to decide how to implement this - the above is just my personal style (and we do have EmptyATXID today, for example, so it might be more consistent).

Q.1.A: BallotID

is BallotID also Hash20?

Yes. I can't think of anything in the protocol right now that should be Hash32.

Q.1.B: BallotID/ProposalID encoding

do you think there is a chance we can encode layerID in BallotID and ProposalID? the tortoise code had to maintain a two-way mapping btwn layerID and blockID. i wonder if we can take the chance here to do better.

Not sure I fully understand. The Ballot has an eligibility proof embedded, from which we derive the LayerID. This should be stored in the (private) layerID member of InnerBallot, so when passing the object around we don't need to re-calculate it (I'm actually not 100% sure this is needed).

The database should have an index from LayerID->BallotID so that we can cheaply iterate over a layer's ballots (in the tortoise and sync). What's the use case of needing the reverse index? We can always re-calculate the layer when fetching a ballot, if it's rare, but if it's common we may need to store an additional index. We should not create another index if there's no use for it. Let's discuss with specifics 🙂

Q.2: VotingEligibilityProof

why is it voting eligibility and not proposal eligibility? what's the rationale and consideration behind this.

It's both. Calling it just EligibilityProof felt too generic to me (eligibility to do what?) and calling it VotingAndProposalEligibilityProof felt too long and detailed to me. We could call it BlockEligibilityProof but since Block is the name of the structure that's the result of unification it could be confusing. I settled on Voting... because that's the persistent part, the proof is actually part of the ballot and then the eligibility to publish a proposal is derived from it. I'm open to changing my mind on this.

Q.2.A: multiple eligibility

if a miner is eligible for multiple ballots/proposals in the same layer, can we take this chance to combine the data?

Yes! I'm doing this now.

Q.2.B: content expecations

is it an error for miner to produce proposals but not ballots? is it an error for miner to produce ballots but not proposals?

Great question! Here's how I see it: there's no reason to explicitly allow anything but the "honest behavior" which is to publish everything you're eligible for. The current design doesn't allow publishing a proposal with no ballot (no other way to prove eligibility) - that's a good thing. Gossiping ballots with no proposal will also not be supported, but there's no way to prevent smeshers from syncing them to the network later - I think that's good enough.

This answer may change as I decide how to support one ballot with multiple proposals.

Q.3: proposal blocks vs hdist

currently tortoise votes according to hare within hdist. after hdist, it votes according to global options (ballots). under this scheme, i don't know why we even need to keep these proposals for hdist layers. once the hare output the unified content block, that block is what tortoise will cast vote on until after hdist.

Because self-healing could be triggered and then we'll need to know what the Hare said. I'm actually not sure about this - @lrettig am I understanding correctly?

Q.4: tortoise vote counting

Tal mentioned that there is no longer the notion of contextual validity with unified content block. why? what is the difference between verifying tortoise voting and self-healing voting with unified content blocks?

The ballots are valid regardless of timing - it was always the case with old blocks that votes were counted regardless of contextual validity so this doesn't change anything. Contextual validity for blocks today is really only used in order to decide which transactions to apply and in order to calculate the layer hash. The new unified Block that will be output by the Hare will already say which transactions to apply. Re: layer hash - it's not part of the protocol and only used for logging. I think it actually makes more sense to include all ballot IDs in the layer hash (not just "contextually valid" ones).

Q.5: pruning

back then when we discussed this, you mentioned

We’ll need to make the content suggestions prune-able with a separate signature.

is that achieved by separate signatures on InnerBallot and InnerProposal and we can prune InnerProposal afterwards?

Yes. We can prune the entire Proposal not just the InnerProposal.

countvonzero commented 2 years ago
Q.1.B: BallotID/ProposalID encoding

do you think there is a chance we can encode layerID in BallotID and ProposalID? the tortoise code had to maintain a two-way mapping btwn layerID and blockID. i wonder if we can take the chance here to do better.

Not sure I fully understand. The Ballot has an eligibility proof embedded, from which we derive the LayerID. This should be stored in the (private) layerID member of InnerBallot, so when passing the object around we don't need to re-calculate it (I'm actually not 100% sure this is needed).

The database should have an index from LayerID->BallotID so that we can cheaply iterate over a layer's ballots (in the tortoise and sync). What's the use case of needing the reverse index? We can always re-calculate the layer when fetching a ballot, if it's rare, but if it's common we may need to store an additional index. We should not create another index if there's no use for it. Let's discuss with specifics slightly_smiling_face

in tortoise code, to reduce memory usage, we keep in-memory mapping from BlockID->opinions. we don't put the actual Block object (where we would have layerID). but for the following use cases we need layer ordering

for these reasons we inject a layerID index to all the maps. and we do keep a reverse mapping from BlockID->LayerID to reduce disk reads. (see tortoise/state.go)

this makes the code overly complex. tortoise should mostly operate on BlockID and votes. if for some reason we can encode layer ordering in BlockID (aka BallotID), then we can significantly reduce the code complexity in tortoise. we really don't need the absolute layerID values, we just need the ordering.

my motivation here is to keep the tortoise code as simple as possible, as the protocol with verifying tortoise+self-healing+reruns is complex enough. simpler code will make future maintenance more error-proof.

Q.2: VotingEligibilityProof

why is it voting eligibility and not proposal eligibility? what's the rationale and consideration behind this.

It's both. Calling it just EligibilityProof felt too generic to me (eligibility to do what?) and calling it VotingAndProposalEligibilityProof felt too long and detailed to me. We could call it BlockEligibilityProof but since Block is the name of the structure that's the result of unification it could be confusing. I settled on Voting... because that's the persistent part, the proof is actually part of the ballot and then the eligibility to publish a proposal is derived from it. I'm open to changing my mind on this.

my thought on this is these all lead to rewards. so maybe RewardsEligibilityProof with a subtype (vote/proposal/validator)... but not important to me at all. just want to understand your thought process.

Q.2.B: content expecations

is it an error for miner to produce proposals but not ballots? is it an error for miner to produce ballots but not proposals?

Great question! Here's how I see it: there's no reason to explicitly allow anything but the "honest behavior" which is to publish everything you're eligible for. The current design doesn't allow publishing a proposal with no ballot (no other way to prove eligibility) - that's a good thing. Gossiping ballots with no proposal will also not be supported, but there's no way to prevent smeshers from syncing them to the network later - I think that's good enough.

i can imagine there will be miners who only want to vote and don't care about making proposals. but they can always pack a empty proposal?

This answer may change as I decide how to support one ballot with multiple proposals.

why should it be multiple proposals but not a single bigger one? are we going to restrict the number of TXs packed in a proposal?

Q.3: proposal blocks vs hdist

currently tortoise votes according to hare within hdist. after hdist, it votes according to global options (ballots). under this scheme, i don't know why we even need to keep these proposals for hdist layers. once the hare output the unified content block, that block is what tortoise will cast vote on until after hdist.

Because self-healing could be triggered and then we'll need to know what the Hare said. I'm actually not sure about this - @lrettig am I understanding correctly?

self-healing will only trigger for layers older than last - hdist. so if we only keep these proposal within hdist it will not be useful.

from the recent research/R&D calls, my understanding is that we put the latest aggregated layer hash in the proposal. and if we have hare majority on a given hash, we prune the TXs that cannot pay gas. if not, we keep all TXs in the unified content block. i thought that was for the ability to re-pack the TXs during self-healing?

with this scheme, i don't see the value keeping the proposals after hare terminate. without this scheme, we need to keep them for as long as we need to go back in history for self-healing.

Q.4: tortoise vote counting

Tal mentioned that there is no longer the notion of contextual validity with unified content block. why? what is the difference between verifying tortoise voting and self-healing voting with unified content blocks?

The ballots are valid regardless of timing - it was always the case with old blocks that votes were counted regardless of contextual validity so this doesn't change anything. Contextual validity for blocks today is really only used in order to decide which transactions to apply and in order to calculate the layer hash. The new unified Block that will be output by the Hare will already say which transactions to apply. Re: layer hash - it's not part of the protocol and only used for logging. I think it actually makes more sense to include all ballot IDs in the layer hash (not just "contextually valid" ones).

mentally separating votes and contents is still tricky for me :D

i thought the layer hash is more about state (TXs applied) instead of ballots. to clarify:

so let me contrast the tortoise voting behavior before and after UCB and see if i get it right. when miner is producing a ballot, tortoise will

does that sound correct to you? @noamnelke @lrettig

lrettig commented 2 years ago

Q.3: proposal blocks vs hdist

currently tortoise votes according to hare within hdist. after hdist, it votes according to global options (ballots). under this scheme, i don't know why we even need to keep these proposals for hdist layers. once the hare output the unified content block, that block is what tortoise will cast vote on until after hdist.

Because self-healing could be triggered and then we'll need to know what the Hare said. I'm actually not sure about this - @lrettig am I understanding correctly?

Self-healing only considers "global" opinion, which means counting the votes it sees in blocks (ballots). It totally disregards the "local" opinion (the input vector, i.e., Hare output).

If I'm understanding correctly, I think @countvonzero is right here and we can discard proposals once we have a unified content block for a given layer.

does that sound correct to you?

A few tweaks:

Otherwise I think I agree.

noamnelke commented 2 years ago

Sorry for the very late response, hope this is still relevant:

in tortoise code, to reduce memory usage, we keep in-memory mapping from BlockID->opinions. we don't put the actual Block object (where we would have layerID). but for the following use cases we need layer ordering

  • sliding window eviction is done by layers
  • base block selection prefer more recent layers.

for these reasons we inject a layerID index to all the maps. and we do keep a reverse mapping from BlockID->LayerID to reduce disk reads. (see tortoise/state.go)

this makes the code overly complex. tortoise should mostly operate on BlockID and votes. if for some reason we can encode layer ordering in BlockID (aka BallotID), then we can significantly reduce the code complexity in tortoise. we really don't need the absolute layerID values, we just need the ordering.

my motivation here is to keep the tortoise code as simple as possible, as the protocol with verifying tortoise+self-healing+reruns is complex enough. simpler code will make future maintenance more error-proof.

I don't think the layer ID can be encoded into the ballot ID at no cost and I don't think we want to pay the cost just to simplify the tortoise data structures. Given that the layer ID of the base ballot can't be extracted from the ballot alone (I think this is what you wanted to achieve, right?) - I think you're saying that then the currently implemented solution in the tortoise is good enough. Did I understand correctly, or do you want me to take a look and see if there's opportunity to improve/optimize it?

my thought on this is these all lead to rewards. so maybe RewardsEligibilityProof with a subtype (vote/proposal/validator)... but not important to me at all. just want to understand your thought process.

As I see it rewards are the motivation, but they're a side-effect. The proof itself doesn't make you eligible for reward - publishing a ballot and proposal is what makes you eligible for a reward. The proof makes you eligible to publish those.

I think having great naming is important, especially in a complex system such as ours. A name can make the difference between understanding immediately what something does and having to read documentation. In some cases an inaccurate name can lead to bugs due to mistaken understanding of a concept.

If you have a possibly better name - please suggest it. Otherwise, I don't think that VotingEligibilityProof is a perfect name, but I think it's good enough if we've exhausted other options.

i can imagine there will be miners who only want to vote and don't care about making proposals. but they can always pack a empty proposal?

Publishing an empty proposal would still make the smesher eligible for the proposal reward. There might be something we can do to punish smeshers who create bad proposals (e.g. vote against them in Hare) but my approach to such non-detrimental risks is to fix it when it happens (post-genesis). Especially since I think the solution can be local (no consensus change).

I can't think of a way (or a need) to punish smeshers that publish bad proposals (or no proposals) other than not rewarding them for those proposals.

This answer may change as I decide how to support one ballot with multiple proposals.

why should it be multiple proposals but not a single bigger one? are we going to restrict the number of TXs packed in a proposal?

We're going to do a single bigger one 😄 . The reason I thought we might want to do several smaller ones is propagation delay - extremely large messages will take longer to transmit and validate and will therefore propagate slower through the network. However, we don't believe we'll have smeshers so large that this will be a problem. If it is there will be relatively easy fixes that we can apply - in the worst case, huge smeshers will have to publish proposals that are not as full as they could be or split their identity. The latter solution doesn't even require code changes.

To be clear, we do plan to limit the number of transactions per proposal. This is part of the transaction selection spec I'm working on right now. The maximum number of transactions will depend on the exact weight of the ballot/proposal and will be validated upon receipt.

with this scheme, i don't see the value keeping the proposals after hare terminate. without this scheme, we need to keep them for as long as we need to go back in history for self-healing.

No need to keep proposals after Hare terminates.

Q.4: tortoise vote counting

Tal mentioned that there is no longer the notion of contextual validity with unified content block. why? what is the difference between verifying tortoise voting and self-healing voting with unified content blocks?

The ballots are valid regardless of timing - it was always the case with old blocks that votes were counted regardless of contextual validity so this doesn't change anything. Contextual validity for blocks today is really only used in order to decide which transactions to apply and in order to calculate the layer hash. The new unified Block that will be output by the Hare will already say which transactions to apply. Re: layer hash - it's not part of the protocol and only used for logging. I think it actually makes more sense to include all ballot IDs in the layer hash (not just "contextually valid" ones).

mentally separating votes and contents is still tricky for me :D

i thought the layer hash is more about state (TXs applied) instead of ballots. to clarify:

  • state hash: calculated after TXs in a UCB is applied. (some TXs in the UCB may not make it to the state)
  • layer hash: calculated after hare output the UCB. basically the chain of UCB from genesis
  • all ballots, late or not, will be kept the mesh and considered valid.

This is not well defined. When I say LayerHash I usually mean the hash of the concatenation of: all ballot IDs in this layer and the previous layer hash. The inclusion of the previous layer hash means this hash represents all past ballots. Whether or not late ballots should be included in this is a tricky question and the answer depends on what we want to do with this hash. If it's for knowing that you agree on all of history (off protocol) then it should probably include late ballots.

so let me contrast the tortoise voting behavior before and after UCB and see if i get it right. when miner is producing a ballot, tortoise will

  • before UCB

    • within hdist: vote for the BlockIDs in the hare output
    • after hdist: count votes in the processed blocks and break tie with weak coin. processed blocks are blocks output by hare (by definition has the correct beacon value)
    • self-healing: count votes from all blocks. all blocks include late blocks and blocks with incorrect beacon values.
  • after UCB

    • within hdist: vote for the UCB output by the hare
    • after hdist: count votes in all ballots and break tie with weak coin. all ballots means all ballots we have up to this point with the correct beacon value.
    • self-healing: count votes in all ballots plus ballots with different beacon values.

does that sound correct to you? @noamnelke @lrettig

I think Lane answered this.