Substrate with Gas Metering and Gas Limit

shawntabrizi commented 3 years ago

Now that we have transactional storage in Substrate, we should investigate what world might look like where we use gas metering instead of the current Weight system in Substrate.

With transactional storage, we are now able to stop runtime execution at any point and safely roll back any state changes, which means that given some gas limit, gas metering could be used to execute transactions, hit the weight limits, and charge fees.

In this world, all transactional extrinsics would have gas metering enabled by default. Overall, this would make development on Substrate for production safe code much easier for the average developer, and nearly reduce the complexity to that of Contract development.

pepyakin commented 3 years ago

First of all, when I read the "gas metering" I hear:

a) instruction level metering b) automatic tracking the resources consumed without any specific interaction from the developer. That implies also implies automatic termination when you reach the limits.

since that's what gas metering usually refers to.

A few notes:

Instruction level gas-metering is typically associated with a non-negligible performance penalty, as a ballpark I'd throw 30% but it really depends on the workflow and some might suffer up to Nx.

I actually curious how would such gas metering be implemented. Like when the code exceeded the allotted gas budget what happens next? You'd need to somehow unwind the execution to the moment of transaction. Now, if this process is automatic, this poses a question what happens with the heap and, perhaps, other resources. I see, two options here:

cooperative termination: i.e. the execution thread cooperates to finish all the functions, specifically because we need to run all the destructors to not leak the heap memory. Note, that this cannot be performed by some kind of instrumentation on wasm level since all the information is lost at that point. So that implies some sort of cooperation from rustc, which is a tar pit. And no, you can't sprinkle some proc macros there.
you launch this gas metered process inside of a sandbox, then when the sandboxed execution exceeds the gas budget it just panics which destroys the sandboxed instance and reports the failure to the runtime. That would perhaps require us to extend the sandbox functionality quite a bit (for one, add a feature to use the optimizing compiler). But even with all optimizations added it would perhaps still suffer some extra overhead (on top of that of gas metering). Another option is to add a special primitive dubbed ext_try discussed in paritytech/polkadot-sdk#370 .

I really don't think that 1 is feasible... Which leaves us with number 2 which most likely will require some serious design & engineering work.

shawntabrizi commented 3 years ago

Thanks for the comment @pepyakin.

30% overhead is quite a bit, but in terms of making Substrate a more accessible platform for end developers, it may be something that people would rather have than needing to do end to end benchmarking. It would also be interesting to see exactly how much overhead there is.

add feature to use the optimizing compiler

What does this mean?

Why would we need ext_try when we already have transactional storage? Obviously I am missing some details.

pepyakin commented 3 years ago

sandbox, as of right this very moment, is powered only by wasmi. There is an ongoing effort to add wasmtime/lightbeam which is a good improvement. However, being a linear time compiler lightbeam is in somewhat more constrained in terms of what kind of optimizations it has in its toolbelt: only linear time algorithms.

An optimizing compiler on the other hand typically has a lot more freedom to optimize the input code because it is not constrained in choice of algos. wasmtime/cranelift that powers our --wasm-execution=compiled mode is an optimizing compiler. It can have pretty bad execution time or memory consumption in the worst case, but only if somebody tries to exploit it deliberately. In vast majority of cases you throw rustc output to it, such a compiler will happily chug along.

So the idea here is to add an optimizing compiler to sandbox for exactly such kind of cases: where we have trusted code but still want to isolate it nonetheless.

Why would we need ext_try when we already have transactional storage? Obviously I am missing some details.

ext_try, or specifically a host function that would create a new runtime instance (or fork the existing) that you can dispose without problems, is one possible solution to the second problem I outlined in my previous message.

Specifically, how would you automatically terminate execution of a control flow thread which exceeded the allotted gas budget. If we had ext_try, the metered instance could just panic destroying the sandboxed instance. This condition would be caught by the parent instance (the original runtime instance, if we assume that no inception like situation is taking place :p ) and handled gracefully.

That's basically me rewording my answer from my previous message. I can try to address something specifically if you give me a hint which parts is not clear.

shawntabrizi commented 1 year ago

Bump on this:

I recently completed the Cosmos SDK Academy and learned that they use a "metered database".

Specifically, since reads / writes to the database are small in number, it is very little overhead to properly meter these operations.

That means, we could actually see how deep in a trie we go to access data, how large the storage item is, how many sibling nodes exist, etc...

With proper benchmarking, we should be able to very accurately measure the database at all various sizes and shapes.

Then, we can do direct metering of the database, and exactly measure how much weight (both time and size) is being consumed.

I suspect that this will be much more accurate than the current worst-case-scenario weights, and would not require that we migrate to full execution gas metering.

The only thing, is that this would require that we introduce a "weight limit" as a parameter that users submit with their transaction, but this issue suggests that we should do that anyway, and it should prepare us for execution gas metering if we can get there.

cc @ggwpez

Related thread:

https://github.com/paritytech/substrate/issues/11677

ggwpez commented 1 year ago

For PoV i think we can use the clawback https://github.com/paritytech/polkadot-sdk/issues/209 to get precise results.

As for time weight: it is not deterministic enough to be measured AFAIK. There are still ongoing discussions... maybe the proposed RISC-V improves the situation.

paritytech / polkadot-sdk

Substrate with Gas Metering and Gas Limit #342