tetratelabs / wazero

wazero: the zero dependency WebAssembly runtime for Go developers
https://wazero.io
Apache License 2.0
4.94k stars 258 forks source link

Reintroduce support for exporting memory #1575

Closed timwu20 closed 9 months ago

timwu20 commented 1 year ago

Is your feature request related to a problem? Please describe.

Hello! :wave:

I’m the technical lead for the Gossamer project, a Go implementation of the Polkadot Host specification.

The team is looking to use wazero as our default runtime in our Polkadot host implementation. We currently use wasmer through the wasmer-go library. Primary reasons for switching are to have no CGO dependencies, and we've been having some significant memory issues with wasmer-go + wasmer.

In the Polkadot ecosystem, the parachains (think app specific blockchain) are built using a blockchain framework called substrate. A component of substrate called FRAME is used to compose it's state transition function (app logic) and it's compiled to Wasm and stored on chain. When executing and validating blocks, a Polkadot host will instantiate a Wasm runtime, load the code on chain, and call certain exported Wasm functions.

Implementing the Polkadot host specification requires a Wasm executor that allows the host to manage the executor’s memory heap.

With FRAME and substrate, Wasm runtimes are authored in rust and they use something called the substrate_wasm_builder to compile to wasm.

use substrate_wasm_builder::WasmBuilder;

fn main() {
    WasmBuilder::new()
        .with_current_project()
        .import_memory()
        .export_heap_base()
        .build()
}

All FRAME runtimes are essentially built this way and they all explicitly import memory. There are exports that need to be insantiated that allocate (ext_allocator_malloc_version_1) and free (ext_allocator_free_version_1) the memory from the host. There has been some dicussion on the substrate repo with regards to no longer importing memory and letting the runtime allocate memory, but Polkadot Host implementations are required to support legacy runtimes already stored on chain, and the suggested change hasn’t gained much traction as of yet.

Describe alternatives you've considered

We've looked over a number of issues from the Wazero repo discussing this:

One of the workarounds mentioned was to define the export functions and memory using wat2wasm, compile it, then import the host functions and instantiate, and finally embed and instantiate the Wasm code.

This solution is problematic for us, since we (the Polkadot host) are not the authors of the Wasm blob. We can do this workaround for the genesis runtime, but with runtime upgrades (new Wasm blobs stored onchain), the export function list could change between upgrades. This would make the first step of re-exporting the functions and memory using wat2wasm at runtime very cumbersome and frankly an unneeded layer of complexity when processing runtime upgrades.

Describe the solution you'd like

We noticed the feature to export memory was previously supported and removed in v1.0.0-pre.2. We would like to reintroduce the ability to export memory when building host modules. We understand that the use case of exporting memory is no longer that frequent, but we feel we require this feature given the requirements of the Polkadot Host specification.

We have forked the wazero repository and made the changes to reintroduce ExportMemory and ExportMemoryWithMax to HostModuleBuilder in #1556. We hope to work with the Wazero team to merge this PR into the upstream repository.

Additional context

codefromthecrypt commented 1 year ago

There are significant problems with doing this generally which are the same as when we removed it, and don't go away as the information and internal design hasn't changed since this was removed.

I would love to help you find another way. The wasm2wat suggestion was made for those who have a static list of functions that don't change. So, it isn't strange to me that this wouldn't work if you don't control the binary. However, the main thing this changes is the workaround approach

The guest can be compiled first, into a CompiledModule, then what it is imported can be generated and materialized without an OS dependency like wasm2wat. For example, https://github.com/tetratelabs/wabin is used today for reasons like this, to manually construct a proxy guest.

It would look something like this https://github.com/tetratelabs/wazero/blob/326c267726c871f441e939be366ec2ac2b565d17/internal/testing/proxy/proxy.go#L41-L99 except use wabin instead of internal code.

I'd be happy to help contribute this to your project, if the approach isn't clear. Can we try this first?

codefromthecrypt commented 1 year ago

ps I joined your discord as snarkabot. Also, you can find me (and the rest of the team) on our gophers slack #wazero https://wazero.io/community/#keeping-up-to-date

timwu20 commented 1 year ago

I tried to create a NewModuleBinary method using wabin in our runtime integration using wazero. However it looks like I've lost the ability to write to the exported memory and read/write values. Here's the commit with added code when instantiating the runtime.

After taking a look through the wabin code, I didn't see any methods on the wasm.Memory type that actually modifies any internal buffer, but it does seem to satisfy the api.Memory interface. I've verified that the host exports are being called correctly, but it looks like the exported memory isn't being written to correctly using this proxy module.

codefromthecrypt commented 1 year ago

@timwu20 so your functions defined in HostModuleBuilder should pass the memory instead of trying to define it externally. something like this:

@@ -2241,7 +2241,7 @@ func ext_allocator_malloc_version_1(ctx context.Context, m api.Module, size uint
        allocator := ctx.Value(runtimeContextKey).(*runtime.Context).Allocator

        // Allocate memory
-       res, err := allocator.Allocate(size)
+       res, err := allocator.Allocate(m.Memory(), size)
        if err != nil {
                panic(err)
        }
codefromthecrypt commented 1 year ago

ps I'm not 100pct certain this is the issue, but it is a distraction at least.

codefromthecrypt commented 1 year ago

the other thing I find strange is that the guest is actually defining its own memory. If I put a breakpoint, the wasm is already defining it like this. Which makes me curious why the host needs to define it. A guest can only use one memory.

$ wasm2wat /var/folders/vd/1cf8zdb1721f4z5rjggy8bp40000gn/T/gossamer/runtimes/hostapi_runtime.compact.wasm |grep memory
                              memory.grow
  (memory (;0;) 17)
  (export "memory" (memory 0))
codefromthecrypt commented 1 year ago

ps while wasmer-go doesn't have a way to access the memory as a function param, you can get the same export via instance.Exports.GetMemory("memory").

If there's a chance that you have some guests that don't define memory, at least in wazero, you can compile them first then look at the CompiledModule and see if it exported it or not. Hope this helps!

timwu20 commented 1 year ago

the other thing I find strange is that the guest is actually defining its own memory. If I put a breakpoint, the wasm is already defining it like this. Which makes me curious why the host needs to define it. A guest can only use one memory.

$ wasm2wat /var/folders/vd/1cf8zdb1721f4z5rjggy8bp40000gn/T/gossamer/runtimes/hostapi_runtime.compact.wasm |grep memory
                              memory.grow
  (memory (;0;) 17)
  (export "memory" (memory 0))

That runtime is the one we use for testing (link). It's just a proxy we constructed to call the functions using the substrate FFI.

I would check the runtime we need to run for westend.

codefromthecrypt commented 1 year ago

@timwu20 I noticed this merged. Congrats!

Would you be ok to close this even if I understand your original reasoning?

timwu20 commented 1 year ago

@timwu20 I noticed this merged. Congrats!

Would you be ok to close this even if I understand your original reasoning?

Yes we merged our integration of wazero, but we are still using our fork that allows us to export memory.

I've tried to use the workaround suggested with your suggestions, but I'm still unable to to get my exports to read values accurately from the exported memory defined in the proxy guest. I've created an example on my personal repo that outlines the problem. The test that demonstrates the problem is here.