Open bkchr opened 8 months ago
@cheme could you please post some status on what needs to be done for Kusama? Aka does the migration works on its own or is that driven by some offchain thingy?
After Kusama was successful, we should also directly start doing it for Polkadot.
Looks ready to go with next kusuma runtime.
This line set at 1 (switch to hybrid state when new runtime released) : https://github.com/polkadot-fellows/runtimes/blob/94b2798b69ba6779764e20a50f056e48db78ebef/relay/kusama/src/lib.rs#L146
Start of migration added to Unreleased https://github.com/polkadot-fellows/runtimes/blob/94b2798b69ba6779764e20a50f056e48db78ebef/relay/kusama/src/lib.rs#L1731
Unrealeased set as runtime migration https://github.com/polkadot-fellows/runtimes/blob/94b2798b69ba6779764e20a50f056e48db78ebef/relay/kusama/src/lib.rs#L1654C35-L1654C35
Not that if there is many migration runing together, might be an idea to lower the limit per block: https://github.com/polkadot-fellows/runtimes/blob/94b2798b69ba6779764e20a50f056e48db78ebef/relay/kusama/src/lib.rs#L2714
here up to 4800 item or 408000 octet which in case of max out blocks can add :
db_weight_reads_writes(1, 1) = (25_000 + 100_000) 1000.
(using rocksdb cst).
`125_000_000 4_800 + 408_000 1_139 + some_fix_weight
600 10^9 + 464_712_000` out of 2_000_000_000_000, so if I check correctly about 1 third of a block weight.
Note that in case of relay chain, consuming weight is not a must have from my point of view.
Warning if the start line get remove from unreleased, the line 146 must be set to 0 (to avoid hybrid state).
@bkchr the weight use in each block is an important point that should be in release note I think.
As a relay chain we could also ignore this weight to prevent any issue.
actually I did use 2 10 ^ 12 for block weight but on relay it may be 610^12 so would not be worrying then
The migration will run entirely on chain and doesn't require any external interactions?
chain and doesn't require any exte
on chain, no possible external interactions.
Kusama is done? I queried StateTrieMigration.MigrationProcess
and got:
{
"progress_top": {
"name": "Complete",
"values": []
},
"progress_child": {
"name": "ToStart",
"values": []
},
"size": 239581990,
"top_items": 881284,
"child_items": 1373
}
The Events also look fine. Just ~480 blocks to migrate everything? Looks like it did about 4800 keys in some blocks, nice 😳
4800 was the limit indeed. This rpc can be call (https://github.com/paritytech/cumulus/pull/1424) to double check the state did migrate correctly (requires runing it locally as it is unsafe). Ultimately another good check is to run a warp sync (if the state is not migrated warp sync should not be working). (cannot run these right now IÂ am ooo)
Ultimately another good check is to run a warp sync (if the state is not migrated warp sync should not be working).
I just tried and it still works.
@cheme so did it finished successfully?
Yes, but if warp sync did pass, all is fine. (warp sync is a guaranteed fail during migration, and the counter on chain do state the migration is finished).
@cheme can you prepare the changes for Polkadot?
Will do (tomorrow most likely), I think since block time is longer on polkadot, keeping the same config as kusuma should be fine.
Ping @cheme
BTW, we also need to migrate the system chains.
https://github.com/polkadot-fellows/runtimes/pull/170 systems chain will be another beast. I was never really happy with the rpc calls process manual migration, last time I thought about it IÂ was thinking of just running the automatic process, maybe with an exclusion list (first scan offchain for big values and process them one by one, but then there is still the issue of big values being added to the chain between the scan and the start of the migration, but realistically speaking and with a bit of knowledge of the system chain logic we can probably be confident this would not happen (big problematic values should be rather rare and no sane chain would create them randomly).
But the manual rpc approach can probably do the job, just the amount of energy to manage that worries me (and also the fact that we rely on a dedicated external trusted entity). With automatic, we need to contact someone competent to know where can be big problematic values, do a scan of the state to find the existing ones and add a skip list of them ( actually a process first then skip list). For this automatic approach new code would be needed in the migration pallet (the skip list related code) though. cc @kianenigma
170
Ohh fuck, have overseen this! Sorry!
I was never really happy with the rpc calls process manual migration
I thought this was also just some bot doing this? Or what you mean by manual in this case?
I thought this was also just some bot doing this? Or what you mean by manual in this case?
yes manual by a bot, still need to run the bot (need slashable fee deposit, also I am not sure anymore if it should target a specific account (seems like a liability: should be open to everyone, but I don't remember how we ensure a single call is done per blocks). Maybe it is fine.
I mean just opening this for one account sounds fine to me. I mean we are speaking here about a one time migration.
but I don't remember how we ensure a single call is done per blocks
This could be done with a storage value that is set to true
when the call was done and removed at on_finalize
.
Kusama is done? I queried
StateTrieMigration.MigrationProcess
and got:{ "progress_top": { "name": "Complete", "values": [] }, "progress_child": { "name": "ToStart", "values": [] }, "size": 239581990, "top_items": 881284, "child_items": 1373 }
The Events also look fine. Just ~480 blocks to migrate everything? Looks like it did about 4800 keys in some blocks, nice 😳
So the kusama state is 240 MiB? interesting. I am still missing a tool like https://github.com/paritytech/polkadot-sdk/issues/449, I wonder if there is an ecosystem tool for this that I am not aware of?
@cheme I would not use the automatic migration on a system chain, as any error will likely cause the parachain to stop.
Maybe on a kusama system chain, but 100% not for Polkadot. I wrote a TS bot that still should work fine to trigger the migrations one by one, and it should all be free. Have we ever used the signed migration? ref: https://github.com/paritytech/polkadot-scripts/blob/master/src/services/state_trie_migration.ts
@cheme I would not use the automatic migration on a system chain, as any error will likely cause the parachain to stop.
Maybe on a kusama system chain, but 100% not for Polkadot. I wrote a TS bot that still should work fine to trigger the migrations one by one, and it should all be free. Have we ever used the signed migration? ref: https://github.com/paritytech/polkadot-scripts/blob/master/src/services/state_trie_migration.ts
Long time ago we did some with @PierreBesson when doing rococo and westand (I think statemine). But for me it is a bit too long ago.
After Polkadot is done, we need to work on the parachains. As said above, I don't see any real problem in using the offchain bot.
I added a list to the issue description. Please tick of the ones that are done.
Kusama is already done. Polkadot should be finished after the next runtime upgrade.
@cheme can you please also add it for the missing ones?
@cheme can you please also add it for the missing ones?
is there new ones? (IÂ mean asset-hub and collectives were done).
Yea there are some remaining it looks like. Asset-Hub Kusama has it behind a feature-gate that was never enabled it looks like?
We need to migrate all parachains and relay chain to state version 1. There is a pallet for doing this. With 1.0.0 Kusama will enable the state migration. After that we also need to migrate the parachains and Polkadot and its parachains. This issue works as a tracking issue.
Bridge-Hub [ ], [ ], [ ], [ ]Bridge-Hub [ ], [ ], [ ], [ ]Coretime [ ], [ ], [ ], [ ]Glutton [ ], [ ], [ ], [ ]People [ ], [ ], [ ], [ ]Three check marks: migration deployed, RPC reports done, migration removed from the runtime.