tinygo-org / tinygo

Go compiler for small places. Microcontrollers, WebAssembly (WASM/WASI), and command-line tools. Based on LLVM.
https://tinygo.org
Other
15.26k stars 901 forks source link

Add a utility or guide to sharing memory with WebAssembly #2787

Closed codefromthecrypt closed 2 years ago

codefromthecrypt commented 2 years ago

TinyGo is a popular way to use WebAssembly using the Go programming language, though I think the skill level required is currently a little steep. There are certain things people need to learn by copy/paste or deep knowledge including marshaling of pointers as well how to share pointers with external functions, yet not have them collected.

For example, in normal Go, runtime.Keepalive is a function used to help the compiler know that a reference is in use, so that it won't be collected prematurely. OTOH, this isn't defined the way wasm would work. In Wasm, the function that uses a pointer is completely invisible to the code known to TinyGo's compiler. Something returning a reference to a caller may be fine only because they read it before calling another TinyGo function, because there's no concurrent GC that would free it meanwhile.

Edge cases like this are pretty steep to understand, and it would feel easier to encapsulate that knowledge by either a guide or a utility lib that can intentionally leak and free memory under a reference. I'd like your thoughts on anything that can be done in this area, even if only docs.

Meanwhile, here's an example of how nuanced and how asymmetric code that handles intentional leaks (to wasm) looks in perspective of other code, which is fairly to the point. https://github.com/efejjota/ebiten-wasm-graphics/pull/2/files

codefromthecrypt commented 2 years ago

Here's an example for the purpose of strawman:

Right now, there's a pattern to return data from TinyGo by making a function that returns a pointer instead of a byte array. Ex. when that compiles to wasm (ex via -scheduler=none -target=wasi), that pointer becomes a normal int32 which is a position/offset in linear memory.

If you need to also share the data length, you can do something like this to pack it into a single value (uint64)

// drawWasm returns the size unsafe pointer representing the bytes returned
// from draw, packed into a uint64. This allows both to be read externally from
// a single result.
//
// For example, the WebAssembly import of this function is compatible with 1.0,
// as it doesn't use multiple results. The implementation also doesn't retain a
// caching field:
//  (import "tinygo" "draw" (func $draw (result i64)))
//
// Ex. In wazero, the buffer can be read back like this:
//  sizeOffset, _ := fn.Call(ctx)
//  size := uint32(sizeOffset[0] >> 32)
//  offset := uint32(sizeOffset[0])
//  buf, _ := memory.Read(offset, size)
//
// Note: The caller of an export that returns this must copy this data
// immediately or risk garbage collection of the underlying buffer.
//export draw
func drawWasm() uint64 {
    buf, err := draw()
    if err != nil {
        panic(err)
    }
    ptr := &buf[0]
    unsafePtr := uintptr(unsafe.Pointer(ptr))
    return (uint64(len(buf)) << uint64(32)) | uint64(unsafePtr)
}

The problem is that the pointer either way intends to leak to the caller. One might be able to guess why you can read this data anyway, possibly because the garbage collector code doesn't have a chance to run. This leads to a question of how defensive the code should be. For example, if a call A returns pointer 1, and you make call B before reading it.. would that work? If so, is that by accident or intent. This drives to wondering if callers should be more defensive, caching a field (which is gnarly if there are multiple possible callers), or doing your own tracking like below. My hope is that this context can help someone give guidance even if the answer is status quo: "leak the memory and it likely won't be collected, so you're probably ok".

longer safer impl below for context. What's not awesome about below is that the caller must call deallocate. The looser status quo approach doesn't, as it likely will be collected, just I'm not sure if it is only working by accident!

// alivePointers maps unsafe pointers to their corresponding values so that
// they aren't collected while in external use (in WebAssembly).
var alivePointers = map[uintptr]interface{}{}

// keepaliveBuf stores a reference to the buffer and returns its pointer.
//
// Callers must invoke the exported function FnDeallocateName to free memory.
func keepaliveBuf(buf []byte) uint32 {
    ptr := &buf[0]
    unsafePtr := uintptr(unsafe.Pointer(ptr))
    alivePointers[unsafePtr] = buf
    return uint32(unsafePtr)
}

// allocate makes a buffer of the given size and returns its uintptr. Once
// finished, the caller must free the memory with FnDeallocateName.
//
//export FnAllocateName
func allocate(size uint32) uint32 {
    return keepaliveBuf(make([]byte, size))
}

// deallocate frees a uintptr returned by keepaliveBuf or allocate, allowing it
// to be garbage collected.
//
//export FnDeallocateName
func deallocate(ptr uint32) {
    delete(alivePointers, uintptr(ptr))
}
aykevl commented 2 years ago

For example, if a call A returns pointer 1, and you make call B before reading it.. would that work? If so, is that by accident or intent.

It might, by accident. It is not guaranteed to work.

longer safer impl below for context. What's not awesome about below is that the caller must call deallocate. The looser status quo approach doesn't, as it likely will be collected, just I'm not sure if it is only working by accident!

This is a better way, and will work correctly (unless we somehow switch to a moving garbage collector, which is highly unlikely). And yes, you must call deallocate manually, there is no way around it. It's unfortunate but I recommend this approach.

codefromthecrypt commented 2 years ago

thanks for the feedback