remove-externalities: parallel key scraping

liamaharon commented 1 year ago

Motivation

The current key scraping implementation is is quite inefficient, fetching 1k keys at a time, waiting for each batch of 1k keys to be returned before asking for the next one

https://github.com/paritytech/substrate/blob/master/utils/frame/remote-externalities/src/lib.rs#L348-L392

This is not so bad when fetching keys from a local node where latency is not a big factor, but can slow things down very significantly when using a remote node when most of the time spend fetching keys becomes latency rather than the node doing work.

Unfortunately, most nodes seem to have a hard limit imposed preventing more than 1k keys being fetched in a single request.

Bonus (probably for a seperate issue/PR once this one is closed)

Fetch storage values for keys as they're pulled from the node, instead of waiting for all keys to be pulled before fetching values.

Notes

This is not high priority, but worth doing once try-runtime-cli is in a more complete stable state to improve the experience pulling state from a remote node.

ggwpez commented 1 year ago

https://github.com/paritytech/substrate/issues/12236#issuecomment-1246997128 could also help here to increase the page size so that it downloads more than 1k per page.

Szegoo commented 1 year ago

@liamaharon I believe that parallel scraping would be a great improvement in the try-runtime-cli user experience. As someone who was lately using try-runtime-cli, I can definitely tell that it took quite some time to scrape all the keys from Kusama or Polkadot.

I would like to implement parallel key scraping if no one is working on this currently.

Edit: I just noticed that there is an existing PR that implements this :sweat_smile:

liamaharon commented 1 year ago

@Szegoo have you tried downloading once to a snapshot and then loading from it each time thereafter? Loaded from a snapshot, the entire Polkadot state takes <1s

Szegoo commented 1 year ago

@liamaharon When testing I had a node running in the background which synced with Polkadot/Kusama, so no I did not download a snapshot once, I was always running try-runtime on the latest state. In another terminal, I was executing try-runtime commands, but as long as not scraping all the keys it won't take a lot of time(I was only scraping keys from a single pallet when doing try-state checks). The only problem is when scraping all the keys so that is when parallel key scraping would come really handy.

liamaharon commented 1 year ago

Is there a reason you need the latest state every time you run the command, rather than periodically refreshing your snapshot?

Szegoo commented 1 year ago

Not really, you are right having a snapshot is more efficient.

liamaharon commented 1 year ago

WDYT about adding a log when people use live state for something other than creating a snapshot, letting them know they probably want to create a snapshot if they'll be running it repeatedly? I could imagine some devs just being unaware the snapshot feature exists, or how to use it.

Szegoo commented 1 year ago

WDYT about adding a log when people use live state for something other than creating a snapshot, letting them know they probably want to create a snapshot if they'll be running it repeatedly?

Might be a good idea. ~Did you imagine having this log whenever the --sync warp option is used when running a node?~

liamaharon commented 1 year ago

I was thinking when someone executes a command like on-runtime-upgrade live ws://..., they probably want to take a snapshot and from then on use snap instead of live. Not sure why the sync mode would matter, maybe I misunderstand?

Szegoo commented 1 year ago

I somewhat misunderstood you. Yes, it definitely makes sense to warn a user in that case.

eagr commented 1 year ago

Does this (for a rough idea) look like what you're aiming for? @liamaharon

async fn rpc_get_keys_paged_batch(&self, prefix: StorageKey, block: B::Hash, parallel: u16) -> Result<Vec<StorageKey>, &'static str> {
    // round to power of 16
    fn round(n: u16) -> (u16, usize) {
        let mut pow: u16 = 16;
        let mut exp: usize = 1;

        if n <= 1 {
            return (1, 0)
        } else if n <= 16 {
            return (pow, exp)
        }

        while pow < n {
            if pow == u16::MAX {
                break
            }

            pow *= 16;
            exp += 1;
        }

        if n * 4 < pow { (pow / 16, exp - 1) } else { (pow, exp) }
    }

    const POW_OF_SIXTEEN: [u16; 4] = [1, 16, 256, 4096];

    fn extension(n: u16, len: usize) -> Vec<u8> {
        let mut ext = vec![0; len];
        for i in 0..len {
            ext[i] = (n / POW_OF_SIXTEEN[i] % 16) as u8;
        }
        ext
    }

    let (parallel, len) = round(parallel);

    let batch = (0..parallel).into_iter().map(|i| {
        let ext = extension(i, len);

        let mut pref = prefix.0.clone();
        pref.extend(ext.iter());
        let pref = StorageKey(pref);

        self.rpc_get_keys_paged(pref, block)
    });

    let keys = futures::future::join_all(batch).await.into_iter()
        .collect::<Result<Vec<_>, _>>()?
        .into_iter()
        .flatten()
        .collect::<Vec<_>>();

    Ok(keys)
}

liamaharon commented 1 year ago

Looks on the right track @eagr

burdges commented 12 months ago

Ain't clear I understand what's being fetched here, but..

In the relay chain, we typically assume nodes are either honest or malicious, so you might be annoyed, spammed, etc. if you fetch too much from only a few nodes, but if you fetch in parallel from more nodes then the chances that you make progress despite some spam increases. Aka more parallelism and less waiting maybe exactly what you want, even if you do not want higher top speeds.

paritytech / polkadot-sdk