paritytech / polkadot-sdk

The Parity Polkadot Blockchain SDK
https://polkadot.network/
1.63k stars 573 forks source link

[Staking] `check_payees` try-state check failing in Westend #3245

Open gpestana opened 5 months ago

gpestana commented 5 months ago

The check_payees try-state check in Staking is failing in Westend. Figure out what is the reason and fix it.

[2024-02-07T16:13:20Z ERROR runtime::frame-support] ❌ "Staking" try_state checks failed: Other("number of entries in payee storage items does not match the number of bonded ledgers")

Example CI job error: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/5142158#L2515

Todo before closing:

gpestana commented 5 months ago

It seems that a staking ledger has been removed without clearing up the bonded and payee entry.

[2024-02-07T22:18:11Z INFO  runtime::staking] [19451704] 💸  count Ledger 72560, count Payee: 72561, count Bonded: 72561

The (old) controller account that is faulty is 5CqVcAhUzKbMMwrZiJDSAwXkmoYLpaYBXkobUAp3biVAQoXc

[2024-02-07T22:36:17Z INFO  runtime::staking] [19451704] 💸  controller that does not have a bonded entry: 2228ce54942b2da458b212dc8cb348f59752c28443538ea92324f9b890352611 (5CqVcAhU...)

Timeline:


Notes

// for `5HHaaUvCwAb16KkKy7cpMnuzUgLWH94gEvMVXugc69ZEDfkj` stash

staking.slashingSpans: Option<PalletStakingSlashingSlashingSpans>
{
  spanIndex: 2
  lastStart: 5,639
  lastNonzeroSlash: 5,638
  prior: [
    5
  ]
}

staking.bonded: Option<AccountId32>
5CqVcAhUzKbMMwrZiJDSAwXkmoYLpaYBXkobUAp3biVAQoXc
gpestana commented 5 months ago

Solution

Reap the does not work, as it fails with staking.NotController since the staking ledger does not exist anymore in storage. This will require a small migration which I can work on.


Done. The solution was to set_storage of the staking.ledger(5CqVcAhUzKbMMwrZiJDSAwXkmoYLpaYBXkobUAp3biVAQoXc) with a ledger where the total = 0 and then call reap_stash to clean all the storage items of this ledger.

gpestana commented 4 months ago

The try-state is failing again, now with 16 accounts that are faulty. A recent deprecate_controller_batch seems to have been the culprit here. The stashes affected seem to also have bonded in the same timespan as the last stash that has been fixed.

Looking again into this and will check if this may happen to other bonded stashes (also in Kusama and Polkadot).

The solution should be the same as described above, for all the affected stashes.

gpestana commented 3 months ago

For the record and for future reference, the root issue here is that the current staking logic is not preventing controllers from becoming stashes of different ledgers. This may lead to an account being stash of a ledger and a controller of another ledger. The 2nd order issue is that set_controller is not expecting controllers to be stashes of other ledgers. So ledgers in this state may end up corrupting the ledger data and metadata when calling set_controller. For more details on this issue and backstop patch, refer to https://github.com/paritytech/polkadot-sdk/pull/3639.

A patch release (v1.1.3) has been proposed for Kusama and Polkadot to prevent stashes from becoming controllers of other ledgers and backstop the corruption issue. The plan now is to recover the corrupted ledgers across all chains. Once that's done, we can close this issue.