polkadot-fellows / RFCs

Proposals for change to standards administered by the Fellowship.
https://polkadot-fellows.github.io/RFCs/
Creative Commons Zero v1.0 Universal
114 stars 55 forks source link

Metered Weights in the Polkadot-SDK #49

Open shawntabrizi opened 10 months ago

shawntabrizi commented 10 months ago

Before creating a full RFC, I want to start a discussion on a potential direction around improving Weights and Benchmarks in the Polkadot SDK.

Problems to solve

High Level Ideas

The Polkadot-SDK runtime should take a "step backwards", and introduce weight metering as the base level of execution limiting, rather than the pre-measured weight system that exists today.

Weight Metering is compatible with pre-measured weights, but not vice-versa.

One of the goals of the Polkadot-SDK is to be as general as possible, and allow for customization at each level of the stack, especially the runtime. As I understand, we have chosen a system where execution of a block in the runtime requires knowledge of the weight of that block ahead of time. This appears to be less flexible than using execution metering.

For example, assuming a system where we did execution metering, the runtime could bypass the metering system and directly inject the weights that it knows are correct for a given execution. However, with pre-measured weights, we have no flexibility to implement a metering system within a custom runtime framework.

Benchmarking pushes overhead to developers.

Benchmarking is quite the laborious process, especially with more complex pallets. It is a large blocker from building an idea, and deploying a product which is relatively safe to use.

If we want to keep Polkadot SDK competitive for innovators and builders, we cannot have this large overhead where other existing platforms do not.

Benchmarking can be extremely pessimistic.

Because we need to use the worst case situation for every extrinsic, the final calculated weights for a block can be much more than the time it actually takes to execute that block. It was previously calculated that a block full of only transactions uses only 60% of the total pre-calculated weight.

Even if extrinsics use weight refunds, it is likely that we wont optimally fill blocks because we only start include an extrinsic in a block if the worst case scenario weight would allow it to fit, not the final weight.

High Level Solutions

At the base Runtime API, support Weight Metering

Blocks and extrinsics being executed in the runtime should provide a max_weight parameter, and fail to execute if the metered weight is higher than the max_weight.

Perhaps this should be an Option where None can be provided to be backwards compatible and the runtime will be forced to provide a pre-calculated weight.

Runtime Should Support Panics

It seems that in order for metering to ever work, we would need to be able to suddenly halt extrinsic execution when the metered weight is beyond the expected max_weight. A panic is the right tool for this, correct?

In any case, allowing panics in the runtime would also improve developer experience since this is a major area where a runtime developer can make a mistake, and make their chain vulnerable to attack.

Weight Metered Database

It is not my suggestion that we provide full weight metering to all execution in the runtime. This would just bring us back to the performance of smart contracts.

Instead, I suggest we create a special DB layer which provides very specific weight information about database access as it happens during runtime execution.

We know that DB operations account for the majority of weight costs in the runtime, and that usually the number of DB operations is also quite low. (We should do basic analysis of existing pre-metered weights to back this up tangibly).

If we only meter the database, and assume that other execution is nominal, then we can get a very high performance environment with high accuracy.

The DB Layer could provide very specific details like exactly where the item exists in the merkle trie (depth, size, neighboring children, if it or other neighboring children have already been cached, etc..). Then with really comprehensive database benchmarks, we can dynamically meter how much weight each data operation would be.

Perhaps it is possible to forgo this minimal overhead when pre-calculated weights already exist, or, this can be used to automatically provide weight refunds when there is knowledge that the db weights are overestimated.

Handling Execution Weight

With a metered database, I suspect we will calculate a majority of the weight used in a block / extrinsic.

However, to get full saftey, we can provide a few different tools:

Custom Additional Weight

We already provide APIs for runtime developers to manually add more weight during extrinsic execution. This can be used to increase the weight where we know that the metered databse is not enough.

In fact, the benchmarking system already splits benchmarking between Wasm execution and the database operations, so we already provide a method for users to actually discover the "missing" weight.

Custom Weight Buffering

We could also allow runtime developers to add their own custom "weight buffer" to keep their extrinsics more safe. For example, we could add an additional 20% overhead to the weight returned by metered database.

gui1117 commented 1 month ago

The DB Layer could provide very specific details like exactly where the item exists in the merkle trie (depth, size, neighboring children, if it or other neighboring children have already been cached, etc..). Then with really comprehensive database benchmarks, we can dynamically meter how much weight each data operation would be.

This would require to specify an abstract database architecture with a specific cache size, and enforce all clients to have a database compatible with this abstraction. Maybe just enforcing one cache (without all the neighboring children, depth, size, information) can already be a huge improvement.

Also if we have more memory in the runtime with PVM we can also implement some caching inside the runtime itself I guess.

I see we can do this RFC in multiple step:

ggwpez commented 1 month ago

The PVM will allow for deterministic metering. I think the longer term goal is to recompile the runtimes to PVM and then use that.
I am not sure if its worth to do a lot of effort earlier. WASM is just fundamentally flawed in this regard (being a stack machine). The PVM story is probably year out though.

gui1117 commented 1 month ago

This keeps open the question of "Weight Metered Database" or the point (2) in my comment.

Or if PVM allow more memory in the runtime we can implement a cache inside the runtime maybe.