paritytech / polkadot-sdk

The Parity Polkadot Blockchain SDK
https://polkadot.com/
1.92k stars 709 forks source link

Staking Elections: Consider removing validators with no points #5674

Open kianenigma opened 2 months ago

kianenigma commented 2 months ago

As reported recently by @eskimor, some both Polkadot and Kusama have some bad validators that produce no blocks. This could be because of slow hardware.

In principle, these validators are sub-optimal to nominate, because they produce no staking rewards. So we hope nominators will filter them out. A mechanism like https://github.com/polkadot-fellows/RFCs/pull/104 could help further.

Nonetheless, pallet-staking can become slightly more proactive, and itself chill validators who were elected for a number of eras, but didn't gain any points.

This help Polkadot be super sure that the average block time is not going above 6s.

Bad validators, you've got 14 400 blocks a day, around 48 blocks per validator, so 1 validator bad you get 48 missed blocks.The spikes seem to show 3 bad validators and normally seems to be around 48.

This shows me there is one bad validator in polkadot now: https://apps.turboflakes.io/?chain=polkadot#/insights, that is not getting any points so it is not producing any blocks

This guys has been getting 0 points in the past 32 eras: https://apps.turboflakes.io/?chain=polkadot#/validator/5DoG4qkLsAQBj69i2Uo2k2LBfRrWp7BhqjJuGcPQrSr6yE6P?mode=history.Most likely unmaintained rather than malicious, so kicking him out would probably help here. Interesting this validator got nominated by a single account with 3M dots: https://polkadot.subscan.io/validator/12jZDB1QiwffAdADz7r2tBALX3rAWQjqvE3PRuNmQXsd9pnwAnd the account nominated around 16 validators: https://polkadot.subscan.io/nominator/14Ns6kKbCoka3MS4Hn6b7oRw9fFejG8RH5rq5j63cWUfpPDJ?tab=vote, so it really won't notice its rewards dropping a bit because the rest of nomination would still get him rewards, so it is an argument for having some automated logic.

Potential solution: see https://github.com/paritytech/polkadot-sdk/issues/5674#issuecomment-2455347584

sandreim commented 2 months ago

@kianenigma Is this being worked on or planning to start soon ? I think it should be prioritised given how often this bites us.

burdges commented 2 months ago

Yes, this sounds useful. Can they unchill themselves easily? Or do we do it when they change their session keys?

kianenigma commented 1 month ago

I don't see the bandwidth for it in the Runtime function atm, but I can offer two options:

  1. @maciejhirsz this is closely related to work you plan to do around https://github.com/paritytech-secops/srlabs_findings/issues/417. It is in the same code path, and tackling this issue can be a great warm-up task. WDYT?
  2. I can assign a new joiner to work on this, if permitted. This is less reliable.
kianenigma commented 1 month ago

In my original issue, I said "chill validators that produce no staking reward".

This is the harsher, radical approach. It is easier weight-wise to implement: we can have a #[pallet::task] or on_idle that regularly looks at validators who were active (a key in Exposures storage), but have no reward points in the past eras, and force chill them.

A more mild one would be that we let them remain a validator, so that people can nominate them and such, but in the validator snapshot aka. pre-sorting step, we remove them. This is more difficult to implement weight-wise.

@gpestana can advise you if the weight will be an issue.

burdges commented 1 month ago

I do like the idea intuitively, but..

I'd suggest "too few points from X, Y, and Z" instead of "no points" per se, although disputes slashing could be replaced by negative points, which maybe confuses this. Anyways..

As a rule, relay chain block production matters somewhat, but you can miss your few slots easily. I'd ignore backing rewards because approvals matters far more and backing must never be more benefitial than approvals. See https://github.com/polkadot-fellows/RFCs/pull/119

Approvals require the median computation given in https://github.com/polkadot-fellows/RFCs/pull/119 but then removing would become a simple majority vote. yikes! Instead, we could remove when the same computation finds too poor an approval score at the 2/3rd percentile. I'll caution this runs like 1 full session later, so occurs only after one full session.

We've no rewards for grandpa or beefy, but maybe in the future, and they must work like approvals via median computations.

As RFCs, I'd suggest merging this into https://github.com/polkadot-fellows/RFCs/pull/119 or making a followup RFC, because likely this should use the approval rewards, at least initially.

Ank4n commented 3 weeks ago

In my original issue, I said "chill validators that produce no staking reward".

This is the harsher, radical approach. It is easier weight-wise to implement: we can have a #[pallet::task] or on_idle that regularly looks at validators who were active (a key in Exposures storage), but have no reward points in the past eras, and force chill them.

A more mild one would be that we let them remain a validator, so that people can nominate them and such, but in the validator snapshot aka. pre-sorting step, we remove them. This is more difficult to implement weight-wise.

@gpestana can advise you if the weight will be an issue.

Another possible alternative:

A new extrinsic, chill_inactive, that accepts a validator and a proof of zero era points. The proof can simply be a vec of x eras within the last 84 eras where they have zero points, where x is the threshold to chill an inactive validator.

This would be straightforward to implement, and since such cases should be rare, there’s no need for on-chain logic to actively detect them. A side effect of this approach is that, say if we set x = 2, any validator with 0 points for 2 eras can be chilled by anyone until those 0-point eras are cleared from the state. This could serve as a nice punishment; however, if this is a problem, it should be trivial to store the era in which a validator set their validation intention and only check for zero-point eras occurring post that.

Ank4n commented 3 weeks ago

@michalisFr reported another validator that might have gone permanently offline but never removed their intention to validate.

This validator issued their validation intention 4 years ago and since then there's been no activity in the account. It doesn't seem they participated much in the active set after that, until era 1608 last week, when they generated 0 era points.