spacemeshos / SMIPS

Spacemesh Improvement Proposals
https://spacemesh.io
Creative Commons Zero v1.0 Universal
7 stars 1 forks source link

SVM Code Reuse between Templates #70

Closed YaronWittenstein closed 1 year ago

YaronWittenstein commented 2 years ago

SVM Code Reuse between Templates

Overview

This SMIP specifies the required changes in SVM to support code reuse between Templates. It describes an experimental idea that might be put to the test.

That being said, it's advised to implement ASAP the Storage Layout Section Index detailed later in this document. Even if this SMIP isn't implemented at the Wasm level - most ideas can be implemented in the higher-level layers such as SVM SDK.

The Storage Layout Section Index is low-hanging fruit and will make the current SVM codebase future compatible with implementing this SMIP (or something similar within the SVM SDK).

The underlying assumption is that the Wasm code we deal with has no polymorphism. In other words, usage of the call_indirect opcode is forbidden here.

Having a Fixed-Gas Wasm satisfies this restriction, but this solution covered in this document could apply to non-Fixed-Gas Wasm programs as long as the above rule is enforced.

Goals and Motivation

The SVM Templates were introduced to avoid deploying the same code when one would like to launch a new Account of an application. The main reasons for introducing Templates were for code reuse and saving on Storage (which is an expensive resource).

This SMIP attempts to take it one step further and propose a mechanism for reusing specific code functions between different Templates. The motivation here is to reuse popular pieces of code that otherwise had to be implemented each time different Templates would require it.

Another angle to look at is by thinking of composability - given "black boxes" we know to work well, we'd like to reuse them. (see the "Open-Closed" Principle).

This SMIP stays on the Wasm level of a Template. Other ideas revolving around specific Template Compilers (such as the SVM SDK in Rust) could probably spring to mind - but these are out of scope for this document.

For example, say we have a Template denoted as Template A in its Wasm form. The Template contains Wasm Functions f1_wasm, f2_wasm, and f3_wasm.

While working on a new Template implementation named Template B(let's settle on using Rust for coding it, but it doesn't matter), we have programmed (in Rust) functions f4_rust and f5_rust, and now we're about to start coding f6_rust.

The thing is that we already have that f6_rust logic implemented somewhere else but not in Rust (and even if it's been coded in Rust - we don't have the source code either).

The f3_wasm seems to have the same logic we'd like to have for our f6_rust. It's too bad there's nothing we can do about it. That's where this SMIP comes to play - we'd like to be able to have f3_wasm reused as our f6_rust (in high-level).

The idea is very similar to using a Linker. So this proposal for a solution will be similar to using Static Linking in our case.

In other words, we'll reuse the f3_wasm code, but it'll exist twice (once for each Template) as opposed to once (which would make it more as Dynamic Linking).

Good real-life examples for functions to be reused are anything related to Signatures Schemes. Say we have a Template with a verify method implementing 2-3 MultiSig. This Signatures Scheme could probably be reused in many other Templates.

If we could take the verify code in its Wasm format and somehow inject it into other Templates, we can save reimplementing the 2-3 MultiSig scheme.

High-level design

Naming:

The implementation contains a couple of phases:

  1. Create a Temporary Stub
  2. Storage Layouts Relocation
  3. Functions Indexes Relocation

Create a Temporary Stub

First, we need to decide what functions of Template Origin we'd like to reuse within Template New. For simplicity, let's assume we have only one such function, denoted as f1_origin, we'd like to reuse, and that f1_origin doesn't call any other Wasm functions (it could call imported functions, though). We'll create a corresponding Wasm function; let's name it f1_new under Template New.

The function signature of f1_new will be the same as f1_origin, and its body will only contain some random return value so that the Wasm will be valid. (if the function should return i32, then the body could contain an opcode returning zero, for example, etc.)

The compiler (SVM SDK or similar) we'll be in charge of emitting these Stubs.

Storage Layouts Relocation

So now we have a function named f1_new with the same signature as f1_origin and an empty body (besides returning something to make the Wasm code valid). If each Wasm function were completely stateless, then we'd be done at this point.

Unfortunately, that's not enough! A Wasm function of a Template might interact with the running Account's Storage. For example, it could read from or write to its Storage Variables. If f1_origin contains a read operation such as svm_get32(5), it doesn't imply that running it intact in f1_new will work as expected.

A reminder: The svm_get32(5) asks to read the storage variable indexed 5.

To have f1_new working properly, we'll have to extend the Storage specification of Template New.

Say that Template Origin had a single Storage Layout Section containing 10 variables and that Template New also had one Storage Layout Section with 20 variables defined. The accommodated Layout of Template New will now contain two Storage Layout Sections.

Template New will have one new Storage Layout Section added. (the old one with the 20 variables and a new one of Template Origin with the 10 variables).

For simplicity, the new Storage Layout Section will be positioned second. The old code of Template New will continue working the same, and we'll need to relocate each interaction against the Storage Calls for the code reused from Template New (the code we clone out of f1_origin).

Right now each Storage-related host function is of the form svm_getXXX(var_id) or svm_setXXX(var_id, val). The notion of different Sections isn't reflected in the current design.

Things get more complicated when each Template has multiple Storage Sections - and not only a single one. It seems that the most straightforward tactic for doing that relocation will be by introducing another dimension - the Section Index.

The SMIP proposes to attach a Section Index alongside each variable. So svm_getXXX(var_id) becomes svm_getXXX(var_id, section_idx) and svm_setXXX(var_id, val) becomes svm_setXXX(var_id, val, section_idx).

This new layer of indirection adds more flexibility since we could now have multiple variables with the same index, each associated with a different Section Index. We turn the variable id from a Global unique identifier to a local one within each Storage Section.

Let's now return to our example. The Template New will contain two sections: the original one with 10 variables and the new one taken from Template Origin having 20 variables.

Each svm_getXXX call in f1_origin (the function we'll like to reuse within f1_new) will be of the pattern: svm_getXXX(var_id, 0) since there is only a single section. Similarly, each svm_setXXX under f1_origin will be of the form set_setXXX(var_id, val, 0).

Under the f1_new each such call should become svm_getXXX(var_id, 1) or svm_setXXX(var_id, val, 1). If Template New had 3 Storage Layout Sections then f1_new calls should have been accommodated to: svm_getXXX(var_id, 3) and svm_setXXX(var_id, val, 3).

I general if Template New had N Storage Layout Sections then each call would have to be translated:

The Storage Layout Section #0 under Template Origin will have index N under Template New. The Storage Layout Section #1 under Template Origin will have index N + 1 under Template New and so on...

The remaining question is how to implement the relocation in code - at the Wasm level. It can be a bit tricky; for example, the Wasm code could, in theory, have: svm_getXXX(V, S) where V or S (or both) are not known at compile-time.

We need to be able to apply the relocation to any Wasm code. Wasm is a Stack-Machine; each parameter is pushed onto the Stack when calling a function. We need to detect calls to functions that interact against the Storage and then increment the last call parameter (the one standing for the Section Index). After executing the last opcode before the Wasm call one, the top of the Stack should hold the Section Index.

The transformation we want to do is to:

In Wasm opcodes, it should look like this:

;; Before
call svm_get32

;; After
i32.const N     ;; pushes `N` (it's a constant number)
i32.add         ;; pops the Stack two top values, adds them, and pushes the result back
call svm_get32

Functions Indexes Relocation

Relocation of the Storage Layouts isn't the whole story. Calling svm_get32 could look as call 0 under one Template and as call 1 at another.

The code taken from Template Origin needs to use the same Functions indexes to play nicely in Template New. It can be done by scanning the Function Indexes of each Template and then swapping each call in the reused code to use the one at Template New.

The assumption here is that both Template(s) have the same functions imports. Or that the imports used by Template Origin are a subset of the ones of Template New

If f1_new calls other inner functions, each one will have to be added to the Functions Indexes under Template New. (see Reusing Multiple Functions later).

Other

Global Variables

On top of the above, the Wasm code of a Template will probably have a couple of Global variables. These variables are likely to be associated with Memory Management (pointers to the Stack and Heap). In general - these should stay intact. So, for example, if both Template Origin and Template New have been compiled from LLVM bytecode, things will likely work as expected. If this isn't the case then the whole reuse attempt would not work.

Reusing multiple functions

We said that the code of f1_origin didn't call to other Wasm functions (only to imported host functions). In case f1_origin calls other Wasm functions, then we'll have to relocate these as well.

Of course, the Storage Layout Sections will have to be relocated only once. However, we'll have to make sure also to relocate the functions indexes of these functions (and have these indexes added to the Functions Indexes of the Template New)

Questions/concerns

As said under the Overview, this SMIP outlines an experimental idea, and it might be incomplete. The primary motivation is to have the capability to reuse verify implementations across different Templates.

Dependencies and interactions

Stakeholders and reviewers

@noamnelke @lrettig @neysofu @avive @moshababo