spacemeshos / SMIPS

Spacemesh Improvement Proposals
https://spacemesh.io
Creative Commons Zero v1.0 Universal
7 stars 1 forks source link

Self healing using weak coin #28

Closed antonlerner closed 1 year ago

antonlerner commented 4 years ago

Weak coin in tortoise consensus

Todays verifying tortoise algorithm simply verifies that honest blocks vote the same way to make it easier to tally votes for each honest block. Voting is done by tallying up past votes and voting accordingly - if there are enough prior votes to support a block vote the for the block as well. this is done by supporting blocks who view the same as our node.

This protocol is “fragile”, in the sense that its security depends on all of our underlying assumptions holding continuously since genesis. For example, if at layer i the Hare protocol fails (e.g.,due to some very low probability event), honest parties may not agree on the validity of the blocks in layer i. Suppose honest parties are evenly split about the validity of some block B in layer i - half accept it as valid and half don’t.In subsequent layers, the vote margin for B will be very close to 0 - and an adversary can succeed in a balancing attack: the adversary keeps the margin small indefinitely by always voting “with the minority”; since the margin is already small, the adversary does not have to expend many resources in order to “pad”the minority so that the total number of positive and negative votes remains equal. Thus, honest parties will never reach consensus onB’s validity.

In case there is no consensus on how to vote for a specific block, meaning the for and against votes are pretty balanced we need to make a decision on how to vote. For this we need a shared coin that will cause all blocks in a certain layer to vote in the same direction - to faster converge on a value for all blocks.

Weak coin requirements

The main idea is to randomise honest parties’ votes when the vote margin is small - but in a coordinated way. For intuition, suppose we had a sequence of coin flips that is: – In Consensus: at the start of layer i+ 1, all honest parties agree on the result of the i-th coin flip. – Independent and Unbiased: each coin comes up heads and tails with equal probability. – Unpredictable: the adversary cannot guess the result of the i-th coin flip until the end of layer i

A coin is considered weak because the adversary can guess the result of the coin flip early some of the time, or bias the result so that heads and tails are no longer equally likely. A weak coin is sufficient as the adversary can’t predict the result with more than constant probability, and can’t bias it “too far”.

implementation overview

To implement the weak coin value for each layer participants must agree on its value for each layer, but reach consensus over it's value only in the previous layer and not before.

weak coin generation

  1. have multiple parties generate a VRF(i) and publish it within a certain time frame
  2. after receiving messages up to the allowed time frame - each node will take the smallest vrf - the LSB of that VRF is the weak coin for the next layer (i+1)

actual implementation:

To implement this we can potentially piggyback the hare protocol: a vrf is already send as part of proposal rounds from multiple clients, we can potentially take these VRFs, sort them and take the LSB of the smallest VRF. Quick win ;) Q: can it be that hare fails for more than 1 layer and VRFs somehow don't reach honest miners?

bootstrapping weak coin generation:

For calculating the weak coin we must have at least one iteration of hare protocol - to do so we must have a valid beacon at the start of the first epoch so that hare eligibility can be calculated. For this we will need to have an external beacon provider- since actual tortoise beacon might take time to generate.

Additional questions

Self healing

Self healing is basically tallying up votes without hare results. usually all blocks will vote the same because they use hare consensus to pre determine their votes. This allows us to validate blocks fast since they converge to high positive or negative votes.

when self healing is run

Whenever we don't receive hare result in time, or whenever verifying tortoise fails - meaning that blocks don't vote the same for several layers - we must fall back to self healing.

Self healing will start counting votes for blocks since last verified layer to the current one It will use either hare results if exists and if not will assume weak coin provided for that layer determines the vote of a block on all blocks in it's view. After several layers block votes will converge and allow to determine consensus on block validity. Q: can votes be overruled? i.e a block that has been marked valid by hare and later marked invalid after following self healing consensus? Do we need to revert chain data at this stage?

Running tortoise vote counting algorithm when self healing

To implement the vote counting algorithm we must modify the verifying tortoise to do the following: @y0sher

antonlerner commented 4 years ago

Q: can votes be overruled? i.e a block that has been marked valid by hare and later marked invalid after following self healing consensus? Do we need to revert chain data at this stage?

blocks from the past can be modified, we will need to revert state.

antonlerner commented 4 years ago

we need to have hare protocol at least publishing VRF messages so that we can have a weak coin

antonlerner commented 4 years ago

when we don't have result for layer i, we vote neutral for k layers. if we are in layer i+k +1, I vote for an empty layer. If I don't have hare results I use my local hare input

there should be a threshold for the verifying tortoise layers

we will use the weak coin for blocks with marginal vote counts

nkryuchkov commented 3 years ago

Current weak coin implementation state

Done: (either merged into develop or waiting for merging in https://github.com/spacemeshos/go-spacemesh/pull/2599):

  1. Sending weak coin messages via gossip
  2. Receiving weak coin messages
  3. Calculating weak coin
  4. A few unit tests
  5. Weak coin part in tortoise beacon system test
  6. Check VRF of weak coin messages
  7. Testing basic functionality

To be done:

  1. More thorough testing
  2. Fix bugs if they exist
  3. Write more tests
  4. Fix weak coin part in tortoise beacon system test (sometimes nodes calculate a wrong weak coin, namespace with logs: xzusn)
  5. Check if calculated weak coin values need to be stored in a DB
  6. Check if calculated weak coin values need to be synced between nodes
  7. Votes margin should be calculated in two steps: before applying weak coin, after applying weak coin
dshulyak commented 3 years ago

Check if calculated weak coin values need to be stored in a DB

so far I don't see a reason to persist it. maybe if in future it can be used to punish bad behavior somehow we will have to keep it, but if it wasn't discussed we can skip it for now.

Check if calculated weak coin values need to be synced between nodes

not sure that I understood what is meant by this. you mean like: one node can try to fetch them from another node in certain scenarios? if so I think it is not necessary. eventually majority will receive correct smallest value and will make progress.

Votes margin should be calculated in two steps: before applying weak coin, after applying weak coin

@nkryuchkov and what should be done in both of those cases?

nkryuchkov commented 3 years ago

@dshulyak

so far I don't see a reason to persist it. maybe if in future it can be used to punish bad behavior somehow we will have to keep it, but if it wasn't discussed we can skip it for now.

We'll need to do that in tortoise beacon. I haven't discussed yet, whether it's needed in weak coin, probably it is. So I'd keep the note until research confirms we don't need it. We can skip it now, it's less important than other changes.

not sure that I understood what is meant by this. you mean like: one node can try to fetch them from another node in certain scenarios? if so I think it is not necessary. eventually majority will receive correct smallest value and will make progress.

There may be an issue when a node hasn't received the smallest value but has received all other values (e.g. due to network issue). In this case, the weak coin value may differ. Maybe we need to sync the value (as we're going to do in tortoise beacon) to have the same value on all nodes. But I'm not sure.

and what should be done in both of those cases?

I need to write a more detailed explanation of this. I will tag you when it's done.

dshulyak commented 3 years ago

We'll need to do that in tortoise beacon.

do you have some details about how it will be used?

There may be an issue when a node hasn't received the smallest value but has received all other values (e.g. due to network issue). In this case, the weak coin value may differ. Maybe we need to sync the value (as we're going to do in tortoise beacon) to have the same value on all nodes. But I'm not sure.

i think this is a part of the synchrony assumption. we just assume that eventually, a majority will receive the smallest value. if it is not the case then a different protocol is needed. we may clarify it with research obviously.

nkryuchkov commented 3 years ago

do you have some details about how it will be used?

We've discussed a case when a miner sends a voting message twice in a certain epoch and round: Then we need to store evidence, generate a malfeasance proof (union of two whole voting messages) and report the proof of this miner's bad behavior through gossip, so miner gets banned forever globally (across all modules, not only tortoise beacon). There will likely be other cases of bad behavior. It's also less important than other improvements.

i think this is a part of the synchrony assumption. we just assume that eventually, a majority will receive the smallest value. if it is not the case then a different protocol is needed. we may clarify it with research obviously.

Well, the tortoise beacon has a consensus protocol too but needs additional sync. Yes, this needs to be decided after talking with research.

lrettig commented 3 years ago

Votes margin should be calculated in two steps: before applying weak coin, after applying weak coin

@nkryuchkov which votes margin are you referring to?

nkryuchkov commented 3 years ago

@lrettig https://github.com/spacemeshos/go-spacemesh/blob/9cc3415cc8c4fddcef5a669c0de5c449cda5dcce/tortoisebeacon/votes_calc.go#L56

countvonzero commented 1 year ago

implemented.