Closed liamaharon closed 11 months ago
https://github.com/paritytech/substrate/issues/12236#issuecomment-1246997128 could also help here to increase the page size so that it downloads more than 1k per page.
@liamaharon I believe that parallel scraping would be a great improvement in the try-runtime-cli user experience. As someone who was lately using try-runtime-cli, I can definitely tell that it took quite some time to scrape all the keys from Kusama or Polkadot.
I would like to implement parallel key scraping if no one is working on this currently.
Edit: I just noticed that there is an existing PR that implements this :sweat_smile:
@Szegoo have you tried downloading once to a snapshot and then loading from it each time thereafter? Loaded from a snapshot, the entire Polkadot state takes <1s
@liamaharon When testing I had a node running in the background which synced with Polkadot/Kusama, so no I did not download a snapshot once, I was always running try-runtime on the latest state. In another terminal, I was executing try-runtime commands, but as long as not scraping all the keys it won't take a lot of time(I was only scraping keys from a single pallet when doing try-state checks). The only problem is when scraping all the keys so that is when parallel key scraping would come really handy.
Is there a reason you need the latest state every time you run the command, rather than periodically refreshing your snapshot?
Not really, you are right having a snapshot is more efficient.
WDYT about adding a log when people use live state for something other than creating a snapshot, letting them know they probably want to create a snapshot if they'll be running it repeatedly? I could imagine some devs just being unaware the snapshot feature exists, or how to use it.
WDYT about adding a log when people use live state for something other than creating a snapshot, letting them know they probably want to create a snapshot if they'll be running it repeatedly?
Might be a good idea. ~Did you imagine having this log whenever the --sync warp
option is used when running a node?~
I was thinking when someone executes a command like on-runtime-upgrade live ws://...
, they probably want to take a snapshot and from then on use snap
instead of live
. Not sure why the sync mode would matter, maybe I misunderstand?
I somewhat misunderstood you. Yes, it definitely makes sense to warn a user in that case.
Does this (for a rough idea) look like what you're aiming for? @liamaharon
async fn rpc_get_keys_paged_batch(&self, prefix: StorageKey, block: B::Hash, parallel: u16) -> Result<Vec<StorageKey>, &'static str> {
// round to power of 16
fn round(n: u16) -> (u16, usize) {
let mut pow: u16 = 16;
let mut exp: usize = 1;
if n <= 1 {
return (1, 0)
} else if n <= 16 {
return (pow, exp)
}
while pow < n {
if pow == u16::MAX {
break
}
pow *= 16;
exp += 1;
}
if n * 4 < pow { (pow / 16, exp - 1) } else { (pow, exp) }
}
const POW_OF_SIXTEEN: [u16; 4] = [1, 16, 256, 4096];
fn extension(n: u16, len: usize) -> Vec<u8> {
let mut ext = vec![0; len];
for i in 0..len {
ext[i] = (n / POW_OF_SIXTEEN[i] % 16) as u8;
}
ext
}
let (parallel, len) = round(parallel);
let batch = (0..parallel).into_iter().map(|i| {
let ext = extension(i, len);
let mut pref = prefix.0.clone();
pref.extend(ext.iter());
let pref = StorageKey(pref);
self.rpc_get_keys_paged(pref, block)
});
let keys = futures::future::join_all(batch).await.into_iter()
.collect::<Result<Vec<_>, _>>()?
.into_iter()
.flatten()
.collect::<Vec<_>>();
Ok(keys)
}
Looks on the right track @eagr
Ain't clear I understand what's being fetched here, but..
In the relay chain, we typically assume nodes are either honest or malicious, so you might be annoyed, spammed, etc. if you fetch too much from only a few nodes, but if you fetch in parallel from more nodes then the chances that you make progress despite some spam increases. Aka more parallelism and less waiting maybe exactly what you want, even if you do not want higher top speeds.
Motivation
The current key scraping implementation is is quite inefficient, fetching 1k keys at a time, waiting for each batch of 1k keys to be returned before asking for the next one
https://github.com/paritytech/substrate/blob/master/utils/frame/remote-externalities/src/lib.rs#L348-L392
This is not so bad when fetching keys from a local node where latency is not a big factor, but can slow things down very significantly when using a remote node when most of the time spend fetching keys becomes latency rather than the node doing work.
Unfortunately, most nodes seem to have a hard limit imposed preventing more than 1k keys being fetched in a single request.
Suggested solution
Rather than fetching all keys for the
prefix
in a singlerpc_get_keys_paged
request here https://github.com/paritytech/substrate/blob/master/utils/frame/remote-externalities/src/lib.rs#L528-L533, split theprefix
up into N parts and run therpc_get_keys_paged
tasks in parallel using async/await.e.g. two parallel tasks: start_keys: 0x000000 end_keys: 0x777777 start_keys: 0x777777 end_keys: 0xffffff
Bonus (probably for a seperate issue/PR once this one is closed)
Fetch storage values for keys as they're pulled from the node, instead of waiting for all keys to be pulled before fetching values.
Notes
This is not high priority, but worth doing once try-runtime-cli is in a more complete stable state to improve the experience pulling state from a remote node.