PoV Benchmarking Tracking Issue

shawntabrizi commented 3 years ago

This is a meta issue to track the things needed to add benchmarking support for PoV size, critical for launching Parachains on Polkadot.

[x] Enable Measurement of PoV Size in Benchmarking Pipeline https://github.com/paritytech/substrate/pull/8559
[ ] Take into account PoV compression
[x] Add MaxEncodedLen requirement to storage (meta issue: https://github.com/paritytech/substrate/issues/8719) https://github.com/paritytech/substrate/pull/8735
[x] Add BoundedVec to Storage Primitives https://github.com/paritytech/substrate/pull/8556
[ ] Migrate existing storage Vec to BoundedVec (meta issue: https://github.com/paritytech/polkadot-sdk/issues/323)
[x] Measure average PoV overhead for storage Maps
[ ] Re-evaluate benchmarks for worst case PoV conditions
[x] Weight Tuple (computation_weight, pov_weight) and abstractions https://github.com/paritytech/polkadot-sdk/issues/256
[x] Weight Output Generation / Automation

gui1117 commented 3 years ago

IIUC, instead of Generate "full storage" before running benchmarks we now prefer the runtime to give for each storage the depth of the node of the values, and the maximum encoded size of the value (related issue tracker https://github.com/paritytech/substrate/issues/8719)

I think runtime can provide a storage description and benchmarks will make use of it to make a proper estimation of the PoV size for calls. The storage description can look like this:

struct NodeDescription {
    /// The maximum size of the value of the node
    max_value_size: usize,
    /// The depth of the node in the trie
    max_node_depth: usize,
}

struct StorageDescription {
    /// Associate a node description for all key starting with a specific prefix.
    // E.g. vec![
    //   (
    //     twox128(System)++twox128(Account),
    //     Node {
    //       max_value_size: BoundedEncodedLen::of(AccountId),
    //       max_node_depth: log16(number_of_pallet_in_runtime) + log16(number_of_storage_in_pallet) + log16(number_of_key_in_account_storage)
    //     },
    //   )
    // ]
    prefix_description: Vec<(Prefix, NodeDescription)>,

    /// Associate a node description for a specific key.
    // E.g. for ":code:" key
    key_description: Vec<(Key, NodeDescription)>,
}

So we need a way to give the number of key in a storage. probably helped by the pallet macro with a new attribute #[pallet::max_size(N)] or something like this.

gui1117 commented 3 years ago

EDIT: probably not needed as we can overestimate a bit and adjust once the transaction is proceed. And we can improve in the future.

or maybe we want something more precise than max_node_depth: log16(number_of_pallet_in_runtime) + log16(number_of_storage_in_pallet) + log16(number_of_key_in_account_storage) like:

depth_after_prefix: log16(number_of_key_in_account_storage)
depth_before_prefix: log16(number_of_pallet_in_runtime) + log16(number_of_storage_in_pallet)

So that if the storage is queried multiple time the size related to the depth_before_prefix should not be added and the size related to the depth_after_prefix should amortised

Maybe to allow even more description we should have something nested: so that twox128(System) prefix has a depth, and contains multiple possible suffix (one for each storage), one of them is twox128(AccountId). So that at the end if you query 2 values from system pallet but 2 different storage then you don't need to count 2 times for the node to get the twox128(System) prefix.

coriolinus commented 3 years ago

we need a way to give the number of key in a storage. probably helped by the pallet macro with a new attribute #[pallet::max_size(N)] or something like this.

I believe we only need that for StorageMap, StorageDoubleMap, StorageNMap, etc. Spitballing earlier, we had the idea that for backwards-compatibility, if a max_size attribute is not included, we could use a default value of 300_000. This should ease the migration.

maybe we want something more precise than max_node_depth: log16(number_of_pallet_in_runtime) + log16(number_of_storage_in_pallet) + log16(number_of_key_in_account_storage)

I think it's probably not worth getting too elaborate with the estimate. Given that it's cheap and easy to know the actual POV size and refund the weight once a tx has been processed, we can probably stick with a simple upper bound for POV size estimates.

paritytech / polkadot-sdk

PoV Benchmarking Tracking Issue #398