polkadot-fellows / runtimes

The various runtimes which make up the core subsystems of networks for which the Fellowship is represented.
GNU General Public License v3.0
125 stars 72 forks source link

State migration #74

Open bkchr opened 8 months ago

bkchr commented 8 months ago

We need to migrate all parachains and relay chain to state version 1. There is a pallet for doing this. With 1.0.0 Kusama will enable the state migration. After that we also need to migrate the parachains and Polkadot and its parachains. This issue works as a tracking issue.

Three check marks: migration deployed, RPC reports done, migration removed from the runtime.

bkchr commented 8 months ago

@cheme could you please post some status on what needs to be done for Kusama? Aka does the migration works on its own or is that driven by some offchain thingy?

After Kusama was successful, we should also directly start doing it for Polkadot.

cheme commented 8 months ago

Looks ready to go with next kusuma runtime.

This line set at 1 (switch to hybrid state when new runtime released) : https://github.com/polkadot-fellows/runtimes/blob/94b2798b69ba6779764e20a50f056e48db78ebef/relay/kusama/src/lib.rs#L146

Start of migration added to Unreleased https://github.com/polkadot-fellows/runtimes/blob/94b2798b69ba6779764e20a50f056e48db78ebef/relay/kusama/src/lib.rs#L1731

Unrealeased set as runtime migration https://github.com/polkadot-fellows/runtimes/blob/94b2798b69ba6779764e20a50f056e48db78ebef/relay/kusama/src/lib.rs#L1654C35-L1654C35

Not that if there is many migration runing together, might be an idea to lower the limit per block: https://github.com/polkadot-fellows/runtimes/blob/94b2798b69ba6779764e20a50f056e48db78ebef/relay/kusama/src/lib.rs#L2714

here up to 4800 item or 408000 octet which in case of max out blocks can add : db_weight_reads_writes(1, 1) = (25_000 + 100_000) 1000. (using rocksdb cst). `125_000_000 4_800 + 408_000 1_139 + some_fix_weight 600 10^9 + 464_712_000` out of 2_000_000_000_000, so if I check correctly about 1 third of a block weight. Note that in case of relay chain, consuming weight is not a must have from my point of view.

https://github.com/polkadot-fellows/runtimes/blob/94b2798b69ba6779764e20a50f056e48db78ebef/relay/kusama/src/lib.rs#L1468

Warning if the start line get remove from unreleased, the line 146 must be set to 0 (to avoid hybrid state).

cheme commented 8 months ago

@bkchr the weight use in each block is an important point that should be in release note I think.

As a relay chain we could also ignore this weight to prevent any issue.

cheme commented 8 months ago

actually I did use 2 10 ^ 12 for block weight but on relay it may be 610^12 so would not be worrying then

bkchr commented 8 months ago

The migration will run entirely on chain and doesn't require any external interactions?

cheme commented 8 months ago

chain and doesn't require any exte

on chain, no possible external interactions.

ggwpez commented 7 months ago

Kusama is done? I queried StateTrieMigration.MigrationProcess and got:

 {
  "progress_top": {
    "name": "Complete",
    "values": []
  },
  "progress_child": {
    "name": "ToStart",
    "values": []
  },
  "size": 239581990,
  "top_items": 881284,
  "child_items": 1373
}

The Events also look fine. Just ~480 blocks to migrate everything? Looks like it did about 4800 keys in some blocks, nice 😳

cheme commented 7 months ago

4800 was the limit indeed. This rpc can be call (https://github.com/paritytech/cumulus/pull/1424) to double check the state did migrate correctly (requires runing it locally as it is unsafe). Ultimately another good check is to run a warp sync (if the state is not migrated warp sync should not be working). (cannot run these right now I am ooo)

ggwpez commented 7 months ago

Ultimately another good check is to run a warp sync (if the state is not migrated warp sync should not be working).

I just tried and it still works.

bkchr commented 6 months ago

@cheme so did it finished successfully?

cheme commented 6 months ago

Yes, but if warp sync did pass, all is fine. (warp sync is a guaranteed fail during migration, and the counter on chain do state the migration is finished).

bkchr commented 5 months ago

@cheme can you prepare the changes for Polkadot?

cheme commented 5 months ago

Will do (tomorrow most likely), I think since block time is longer on polkadot, keeping the same config as kusuma should be fine.

bkchr commented 4 months ago

Ping @cheme

bkchr commented 4 months ago

BTW, we also need to migrate the system chains.

cheme commented 4 months ago

https://github.com/polkadot-fellows/runtimes/pull/170 systems chain will be another beast. I was never really happy with the rpc calls process manual migration, last time I thought about it I was thinking of just running the automatic process, maybe with an exclusion list (first scan offchain for big values and process them one by one, but then there is still the issue of big values being added to the chain between the scan and the start of the migration, but realistically speaking and with a bit of knowledge of the system chain logic we can probably be confident this would not happen (big problematic values should be rather rare and no sane chain would create them randomly).

But the manual rpc approach can probably do the job, just the amount of energy to manage that worries me (and also the fact that we rely on a dedicated external trusted entity). With automatic, we need to contact someone competent to know where can be big problematic values, do a scan of the state to find the existing ones and add a skip list of them ( actually a process first then skip list). For this automatic approach new code would be needed in the migration pallet (the skip list related code) though. cc @kianenigma

bkchr commented 4 months ago

170

Ohh fuck, have overseen this! Sorry!

I was never really happy with the rpc calls process manual migration

I thought this was also just some bot doing this? Or what you mean by manual in this case?

cheme commented 4 months ago

I thought this was also just some bot doing this? Or what you mean by manual in this case?

yes manual by a bot, still need to run the bot (need slashable fee deposit, also I am not sure anymore if it should target a specific account (seems like a liability: should be open to everyone, but I don't remember how we ensure a single call is done per blocks). Maybe it is fine.

bkchr commented 4 months ago

I mean just opening this for one account sounds fine to me. I mean we are speaking here about a one time migration.

but I don't remember how we ensure a single call is done per blocks

This could be done with a storage value that is set to true when the call was done and removed at on_finalize.

kianenigma commented 4 months ago

Kusama is done? I queried StateTrieMigration.MigrationProcess and got:

 {
  "progress_top": {
    "name": "Complete",
    "values": []
  },
  "progress_child": {
    "name": "ToStart",
    "values": []
  },
  "size": 239581990,
  "top_items": 881284,
  "child_items": 1373
}

The Events also look fine. Just ~480 blocks to migrate everything? Looks like it did about 4800 keys in some blocks, nice 😳

So the kusama state is 240 MiB? interesting. I am still missing a tool like https://github.com/paritytech/polkadot-sdk/issues/449, I wonder if there is an ecosystem tool for this that I am not aware of?

kianenigma commented 4 months ago

@cheme I would not use the automatic migration on a system chain, as any error will likely cause the parachain to stop.

Maybe on a kusama system chain, but 100% not for Polkadot. I wrote a TS bot that still should work fine to trigger the migrations one by one, and it should all be free. Have we ever used the signed migration? ref: https://github.com/paritytech/polkadot-scripts/blob/master/src/services/state_trie_migration.ts

cheme commented 4 months ago

@cheme I would not use the automatic migration on a system chain, as any error will likely cause the parachain to stop.

Maybe on a kusama system chain, but 100% not for Polkadot. I wrote a TS bot that still should work fine to trigger the migrations one by one, and it should all be free. Have we ever used the signed migration? ref: https://github.com/paritytech/polkadot-scripts/blob/master/src/services/state_trie_migration.ts

Long time ago we did some with @PierreBesson when doing rococo and westand (I think statemine). But for me it is a bit too long ago.

bkchr commented 4 months ago

After Polkadot is done, we need to work on the parachains. As said above, I don't see any real problem in using the offchain bot.

ggwpez commented 2 months ago

I added a list to the issue description. Please tick of the ones that are done.

bkchr commented 2 months ago

Kusama is already done. Polkadot should be finished after the next runtime upgrade.

ggwpez commented 1 month ago

@cheme can you please also add it for the missing ones?

cheme commented 1 month ago

@cheme can you please also add it for the missing ones?

is there new ones? (I mean asset-hub and collectives were done).

ggwpez commented 1 month ago

Screenshot 2024-05-13 at 12 59 27

Yea there are some remaining it looks like. Asset-Hub Kusama has it behind a feature-gate that was never enabled it looks like?