spacemeshos / SMIPS

Spacemesh Improvement Proposals
https://spacemesh.io
Creative Commons Zero v1.0 Universal
7 stars 1 forks source link

[WIP] Tortoise Beacon sync #52

Closed nkryuchkov closed 1 year ago

nkryuchkov commented 3 years ago

Overview

To validate blocks when joining the network, nodes need to know the tortoise beacon value. It needs to be synced between nodes.

Goals and motivation

Handle such cases correctly:

High-level design

TBD

Proposed implementation

TBD

Implementation plan

TBD

Questions

Talk on Slack:

Hi @Tal,

Could you please review and confirm this short summary?

Tortoise beacon values are located in blocks. There are two cases:
- Node is not synced: It doesn’t know the tortoise beacon value in the current epoch, so it needs to decide on tortoise beacon. It goes through blocks and calculates each beacon’s weight according the block’s ATX. Then, it needs to wait for K layers (specified in config) until calculation is finished (until there’re enough beacons in incoming blocks and until we’re sure about tortoise beacon value correctness). After K layers it takes the beacon with most weight.
- Node is synced: It knows the tortoise beacon value in the current epoch from blocks, so it doesn’t have to wait

Also a few questions:
- How do we join and sync the consensus process for next beacon if we finished syncing in the middle of the epoch?
- When we are syncing epoch beacon we will receive blocks with beacons we are not sure are correct, can we validate these blocks with their respective beacons or should we buffer the blocks until we decide in the correct beacon value? These solutions have impact
    - if we choose to validate without knowing the beacon we will need to revert state in case we have decided that valid blocks are now invalid
    - if we choose to buffer the blocks we might risk very high memory consumption until we have found the correct beacon, and heavy CPU load after finding the correct beacon when trying to validate many blocks at once 
Hi @Nikita Kryuchkov,
Thanks for the summary! It looks pretty good. Some small comments:
- The tortoise beacon value only appears in the first block a party generates in an epoch (i.e., you can't change beacons mid-epoch).
- There are actually three cases: (1) and (2) as you described, and (3) is a node who participated in the beacon generation (in which case it doesn't care about blocks). We expect (3) to be the most common case --- (1) and (2) are only relevant for newly joined nodes.
Regarding the questions:
- The participation in the beacon execution doesn't require you to know the previous beacon value. If you finished syncing too late to join the beacon generation for the current epoch, you use the beacon that has the most weight according to the block values.
- I don't understand the question about buffering. We always validate each block using its local beacon, and always store all valid blocks (even if they don't agree with our beacon).  For running the verifying tortoise, we can also get the beacon value as part of the input (in any case we don't count all blocks when we verify). If the verifying tortoise fails and we need to run self-healing, then there are special rules that depend on whether the blocks have the "right" beacon or not (blocks that have a different beacon are only counted when they are sufficiently old --- e.g., two epochs in the past).

Dependencies and interactions

Stakeholders and reviewers

@nkryuchkov @antonlerner

Testing and performance