Memory leaks (wasmtime vs wasmer)

jfmc commented 2 years ago

Would wasmtime provide any advantage over wasmer?

guregu commented 2 years ago

There's one blocker for using wasmtime: https://github.com/bytecodealliance/wasmtime-go/issues/34 We need an easy way to capture stdout to receive results from the Prolog interpreter. We could work around it using files but it's not ideal.

wasmer solves this with a little C shim: https://github.com/wasmerio/wasmer-go/blob/b4462e6583f8d7b964e32bdd8d065cf96fba6c08/wasmer/wasi.go#L18-L54 So it probably wouldn't be too hard to add this to wasmtime.

I think we could also maybe export Go functions into WASM and have Trealla call it via FFI to report results, but I'm not quite sure how that works and if it is possible yet.

I would be especially interested in switching to any WASM runtime with good Windows support. Then we can support all major platforms out of the box.

guregu commented 2 years ago

This looks promising as well: https://github.com/tetratelabs/wazero It's more in spirit with the library I think. Not sure if it's performance is anywhere close to the others. wazero struggles with my broken realpath implementation in TPL so I need to fix that and do some experimenting.

enoperm commented 1 year ago

Sorry for "hijacking" this issue, though I do believe what I have observed is not entirely unrelated to the current state of wasmer-go

I have played around with the library, really like the concept, though I have ran into some issues with memory usage.

As an example, the code below demonstrates three separate issues:

Using a single engine instance for a large number of queries will cause memory usage to climb until around 7GiBs or so, at which point all subsquent queries are stuck - I suspect it has to do with the allocator, though I have not verified it. I'm pretty sure a leak is involved somewhere because I do not think any state should be retained from just querying for nl.
Using a single engine instance for a large number of queries that involve host-interop, will cause memory usage to climb until around 2GiBs, at which point I think the interop-related code uses an int32 to slice into WASM-allocated memory, resulting in a panic:
```
$ go run main.go -- --with-interop
2023/04/20 14:59:30
2023/04/20 15:00:53 trealla: query error: trealla: panic: runtime error: slice bounds out of range [-2130246640:]
exit status 1
```
The reason I described the issue here lies with this leak. In theory, it is not an issue if a WASM application leaks any memory, since it is very cheap to to just throw it away and allocate one with a fresh slate... except doing so will consume memory infinitely. The runtime does not seem to ever release the instantiated WASM modules/memory, and the result is a far larger, and faster memory leak than just using a persistent instance.

package main

import (
    "context"
    "fmt"
    "log"
    "math/rand"
    "os"
    "time"

    "github.com/trealla-prolog/go/trealla"
)

func newEngine(ctx context.Context) (trealla.Prolog, error) {
    engine, err := trealla.New()
    if err != nil {
        return nil, err
    }

    err = engine.Register(ctx, "invoke", 1, trealla.Predicate(func(pl trealla.Prolog, subquery trealla.Subquery, goal trealla.Term) trealla.Term {
        v := fmt.Sprintf("%v", rand.Float64())
        return trealla.Compound{
            Functor: "invoke",
            Args: []trealla.Term{
                trealla.Atom(v),
            },
        }
    }))
    return engine, err
}

func main() {
    ctx, cancel := context.WithCancel(context.Background())
    defer cancel()

    engine, err := newEngine(ctx)
    if err != nil {
        log.Fatal(err)
    }

    log.Print()
    defer func() {
        recover()
        log.Print()
        os.Exit(2)
    }()

    getEngine := func() trealla.Prolog {
        return engine
    }

    // perhaps ugly, but suffices for testing
    query := `nl`
    for _, arg := range os.Args[1:] {
        switch arg {
        case "--with-interop":
            query = `invoke(A)`

        case "--with-new-instance":
            getEngine = func() trealla.Prolog {
                engine, _ := newEngine(ctx)
                return engine
            }
        }
    }

    for {
        func() {
            ctx, cancel := context.WithTimeout(ctx, 10*time.Second)
            defer cancel()
            engine := getEngine()
            doWork(ctx, query, engine)
        }()
    }
}

func doWork(ctx context.Context, query string, engine trealla.Prolog) {
    resultSet := engine.Query(ctx, query)
    defer resultSet.Close()

    for resultSet.Next(ctx) {
    }

    if err := resultSet.Err(); err != nil {
        log.Println(err)
        os.Exit(1)
    }
}

Since the alternative is getting stuck forever, I'm not entirely sure the int32 issue should be fixed just yet - I would personally prefer crashing and restarting to never getting anywhere at all, for one. But I think, assuming I'm not wrong about the source of the leak when each query is served by a new instance, a different WASM runtime might be more beneficial than it seems at first glance.