paritytech / polkadot-sdk

The Parity Polkadot Blockchain SDK
https://polkadot.network/
1.78k stars 640 forks source link

[Meta] State of Disabling #4359

Open Overkillus opened 4 months ago

Overkillus commented 4 months ago

State of Disabling:

For the explanation of this design below go here.

Disabling board for issue-level tracking here.

Stage 0: How the system used to work before the Disabling Overhaul:

  1. Disabling is caused when:
    • Backer is slashed (100%)
    • Equivocations (GRANDPA / BABE / BEEFY)
  2. Disabling lasts for a whole ERA
  3. Disabling limited to 17%
  4. Disabling limit reached causes a force new era
  5. Disabling only stops you from block authoring
  6. Chilling is caused by ALL slashes (even 0%)
  7. Chilling results in validator voluntarily stepping down from next election
  8. Im-Online slashes but does not disable
  9. Actual stash slash is deferred by 27 days
Misbehaviours: Misbehaviour Slash % Disabling Chilling Rep
Backing Invalid 100% Yes Yes No
ForInvalid Vote - No No No
AgainstValid Vote - No No No
ImOnline Offence 0-7% No Yes No
GRANDPA / BABE / BEEFY Equivocations 0.01-100% Yes Yes No
Seconded + Valid Equivocation - No No No
Double Seconded Equivocation - No No Yes

*Ignoring AURA offences. **There are some other misbehaviour types handled in rep only (DoS prevention etc) but they are not relevant to this strategy.




Stage 1: How the system worked with the first few Disabling Overhaul PRs merged:

  1. Onchain disabling is caused when:
    • Backer is slashed (100%)
    • Equivocations (GRANDPA / BABE / BEEFY)
  2. Onchain disabling lasts for a whole ERA
  3. Onchain disabling limited to 17%
  4. Onchain disabling limit reached causes a force new era
  5. Onchain disabling stops you from block authoring
  6. Chilling is caused by ALL slashes (even 0%)
  7. Chilling results in validator voluntarily stepping down from next election
  8. Actual stash slash is deferred by 27 days -- Changes to Onchain Disabling --
  9. Onchain disabling stops you from backing through runtime filtering
  10. Onchain disabling stops you from initiating a dispute
  11. Onchain disabling stops you locally from backing (optimisation)
  12. Onchain disabling makes other nodes ignore your backing statements (optimisation) -- New Offchain Disabling --
  13. Offchain disabling is caused by loosing a dispute
  14. Offchain disabling lasts only for a session (TODO verify)
  15. Offchain and offchain disabling together limited to 33%
  16. Offchain disabling is always lower priority than onchain disabling
  17. Offchain disabling prioritises disabling backers, then ForInvalid, then AgainstValid.
  18. Offchain disabling only stops you from initiation a dispute -- Other changes --
  19. ImOnline is removed

Notes:

Misbehaviours: Misbehaviour Slash % Onchain Disabling Offchain Disabling Chilling Rep
Backing Invalid 100% Yes Yes (High Prio) Yes No
ForInvalid Vote - No Yes (Mid Prio) No No
AgainstValid Vote - No Yes (Low Prio) No No
GRANDPA / BABE / BEEFY Equivocations 0.01-100% Yes No Yes No
Seconded + Valid Equivocation - No No No No
Double Seconded Equivocation - No No No Yes

Ignoring AURA offences. There are some other misbehaviour types handled in rep only (DoS prevention etc) but they are not relevant to this strategy. ImOnline no longer listed in the table.





Stage 2: How the system will work without validator re-enabling:

  1. Onchain disabling is caused when:
    • Backer is slashed for backing an invalid candidate (100%)
    • Equivocations (GRANDPA / BABE / BEEFY)
  2. Onchain disabling lasts for a whole ERA
  3. Onchain disabling stops you from block authoring
  4. Onchain disabling stops you from backing through runtime filtering
  5. Onchain disabling stops you from initiating a dispute
  6. Onchain disabling stops you locally from backing (optimisation)
  7. Onchain disabling makes other nodes ignore your backing statements (optimisation)
  8. Offchain disabling is caused by loosing a dispute
  9. Offchain disabling lasts only for a session (TODO verify)
  10. Offchain and offchain disabling together limited to 33%
  11. Offchain disabling is always lower priority than onchain disabling
  12. Offchain disabling prioritises disabling backers, then ForInvalid, then AgainstValid.
  13. Offchain disabling only stops you from initiation a dispute
  14. Actual stash slash is deferred by 27 days -- Changes --
  15. Chilling is removed
  16. Onchain disabling limit is 17% -> 33%
  17. No longer force new era when disabling limit reached

Notes: This stage will be in effect with the merge and release of 2226

Misbehaviours: Misbehaviour Slash % Onchain Disabling Offchain Disabling Chilling Rep
Backing Invalid 100% Yes Yes (High Prio) No No
ForInvalid Vote - No Yes (Mid Prio) No No
AgainstValid Vote - No Yes (Low Prio) No No
GRANDPA / BABE / BEEFY Equivocations 0.01-100% Yes No No No
Seconded + Valid Equivocation - No No No No
Double Seconded Equivocation - No No No Yes

*Ignoring AURA offences. **There are some other misbehaviour types handled in rep only (DoS prevention etc) but they are not relevant to this strategy.





Stage 3: How the system is planned to work with re-enabling:

  1. Onchain disabling is caused when:
    • Backer is slashed for backing an invalid candidate (100%)
    • Equivocations (GRANDPA / BABE / BEEFY)
  2. Onchain disabling lasts for a whole ERA
  3. Onchain disabling limit is 33%
  4. Onchain disabling stops you from block authoring
  5. Onchain disabling stops you from backing through runtime filtering
  6. Onchain disabling stops you from initiating a dispute
  7. Onchain disabling stops you locally from backing (optimisation)
  8. Onchain disabling makes other nodes ignore your backing statements (optimisation)
  9. Offchain disabling is caused by loosing a dispute
  10. Offchain disabling lasts only for a session
  11. Offchain and offchain disabling together limited to 33%
  12. Offchain disabling is always lower priority than onchain disabling
  13. Offchain disabling prioritises disabling backers, then ForInvalid, then AgainstValid.
  14. Offchain disabling only stops you from initiation a dispute
  15. Actual stash slash is deferred by 27 days -- Changes --
  16. If there are more offenders than limit keep disabled only the highest offenders.
Misbehaviours: Misbehaviour Slash % Onchain Disabling Offchain Disabling Chilling Rep
Backing Invalid 100% Yes Yes (High Prio) No No
ForInvalid Vote - No Yes (Mid Prio) No No
AgainstValid Vote - No Yes (Low Prio) No No
GRANDPA / BABE / BEEFY Equivocations 0.01-100% Yes No No No
Seconded + Valid Equivocation - No No No No
Double Seconded Equivocation - No No No Yes

*Ignoring AURA offences. **There are some other misbehaviour types handled in rep only (DoS prevention etc) but they are not relevant to this strategy.





Stage 4: How the system is planned to work with re-enabling and approval voting slashes enabled:

  1. If there are more offenders than limit keep disabled only the highest offenders.
  2. Onchain disabling lasts for a whole ERA
  3. Onchain disabling limit is 33%
  4. Onchain disabling stops you from block authoring
  5. Onchain disabling stops you from backing through runtime filtering
  6. Onchain disabling stops you from initiating a dispute
  7. Onchain disabling stops you locally from backing (optimisation)
  8. Onchain disabling makes other nodes ignore your backing statements (optimisation)
  9. Offchain disabling is caused by loosing a dispute
  10. Offchain disabling lasts only for a session
  11. Offchain and offchain disabling together limited to 33%
  12. Offchain disabling is always lower priority than onchain disabling
  13. Offchain disabling prioritises disabling backers, then ForInvalid, then AgainstValid.
  14. Offchain disabling only stops you from initiation a dispute
  15. Actual stash slash is deferred by 27 days -- Changes --
  16. Onchain disabling is caused when:
    • Backer is slashed for backing an invalid candidate (100%)
    • Equivocations (GRANDPA / BABE / BEEFY)
    • New: Approval voter is slashed for ForInvalid (2%)
    • New: Approval voter is slashed for AgainstValid (0%)

Notes:

Misbehaviours: Misbehaviour Slash % Onchain Disabling Offchain Disabling Chilling Rep
Backing Invalid 100% Yes (High Prio) Yes (High Prio) No No
ForInvalid Vote 2% Yes (Mid Prio) Yes (Mid Prio) No No
AgainstValid Vote 0% Yes (Low Prio) Yes (Low Prio) No No
GRANDPA / BABE / BEEFY Equivocations 0.01-100% Yes (Varying Prio) No No No
Seconded + Valid Equivocation - No No No No
Double Seconded Equivocation - No No No Yes

*Ignoring AURA offences. **There are some other misbehaviour types handled in rep only (DoS prevention etc) but they are not relevant to this strategy.





Current Stage:

#2226 was merged and awaits release. Once released we'll move from Stage 1 to Stage 2. Stage 2 Live Stage 3 WIP


Changelog:

Split Stage 3 into two distinct stages. 3 introduces re-enabling and 4 introduces approval slashes.

sandreim commented 3 months ago

What does Being disabled stops nodes from initiating a dispute (hard escalation) but issuing a dispute statement causes a no-show (soft escalation) which adds an extra tranche to approval checking. exactly mean here. I don't understand how a dispute causes a no-show.

Overkillus commented 3 months ago

Being disabled stops nodes from initiating a dispute (hard escalation)

Hope this part is clear. It means that an honest validator node when receiving a dispute statement from a disabled validator will not register it as active if it is not already active. The honest validator will in that case not cast his own vote and will not gossip it further. This stops the dispute escalation (hard escalation - because everyone checks in a dispute).

but issuing a dispute statement causes a no-show (soft escalation) which adds an extra tranche to approval checking.

Imagine a scenario when you are an honest good validator but somehow got disabled (some assumptions were broken or nondeterminism crept into the system). In that case when you are assigned for approval checking you can still cast explicit approvals. On the other hand if what you are checking is invalid you cannot start a dispute as noted in the paragraph above. What you will do in that case is miss your approval checking assignment despite announcing it. This will cause a soft escalation because it adds a 1 extra tranche of checkers. This is the minimal level of power and trust we can give to disabled validators. In that case they can still help protect the system but the damage they can do is minimised. The effectively lost the privilege for full escalation but they still have the soft escalation powers.

We need this because it helps us fights against a scenario when honest nodes are somehow slashed and disabled and then without it they would be totally ineffective in approval checking which allows much greater chances of success for attackers. This change helps us preserve security under the same assumptions as DoS protection.

sandreim commented 3 months ago

Thanks for explaining now it makes a lot of sense

burdges commented 3 months ago

The honest validator will in that case not cast his own vote and will not gossip it further

We de facto treat the dispute by the disabled validator as a no-show. In principle, we could make this explicit or even have some "super no-show", so we should not be afraid of gossiping the "bad dispute" in principle. We need not really do this right now, but worth keeping in mind.