paritytech / polkadot-sdk

The Parity Polkadot Blockchain SDK
https://polkadot.network/
1.85k stars 673 forks source link

Add Recovery Options for Validators #93

Open joepetrowski opened 5 years ago

joepetrowski commented 5 years ago

Let's say that a validator has a catastrophic failure in GRANDPA round n, such that they lose their DB (which keeps track of their votes). They have two options to restart:

  1. Conservative: Wait until the next session to come online. This guarantees that GRANDPA is in a round greater than n and they have no risk of equivocating. However, if anyone else misses the Session heartbeat, then there will be a slash and forced era change. The validator will miss one era of rewards, but presumably gets elected in the subsequent era.
  2. Aggressive: Come back online immediately. If GRANDPA is still stuck in round n for some reason, there is a risk of equivocating.

Can we add an option, say a --recovery flag, that would ask other voters what the last vote they received from my validator was when you sync?

If we operate on the assumptions that >50% of validators are honest and that our vote would be gossiped to most of the other validators, we should have some confidence that the majority of "your last vote was..." messages would help us avoid equivocation. It doesn't eliminate, but does reduce, the risk if you choose the aggressive path above.

cc @kirushik

burdges commented 5 years ago

We cannot ask validator operators to know such a --recovery options exists, which makes it kinda useless. I'd prefer that any validator should wait after launch until they know they can safely participate in grandpa.

What does this mean? In principle, your grandpa has only collective voting constraints, so you could participate immediately, except you might equivocate like you say. I've suggested before that validators create their transport keys afresh when launched and register them on chain, so you simply wait until that registration gets finalized before participating in grandpa.

We should do better than waiting for finality when we can believe no other conflicting blocks exist. And fallback requires this but under its usual synchrony assumptions. cc @AlistairStewart

We should discourage validator operators from copying their session keys between machines thanks to the new keys APIs. Yet, we cannot prevent them from copying session keys, and doing so is profitable, so this registration approach avoids footguns.

There are no real problems if they have sentry nodes and/or pass in a manual transport key because you can still wait for the specific registration transaction. In fact, we need not even touch transport keys at all since we could strengthen ImOnline for this:

On launch, you always repost your ImOnline transaction, and locally store the hash of your own ImOnline signature, so the node can tell if some block makes all later blocks clearly safe. We safe when such a block becomes final. We also safe under appropriate fallback conditions.