Closed jfmc closed 1 year ago
There's one blocker for using wasmtime: https://github.com/bytecodealliance/wasmtime-go/issues/34 We need an easy way to capture stdout to receive results from the Prolog interpreter. We could work around it using files but it's not ideal.
wasmer solves this with a little C shim: https://github.com/wasmerio/wasmer-go/blob/b4462e6583f8d7b964e32bdd8d065cf96fba6c08/wasmer/wasi.go#L18-L54 So it probably wouldn't be too hard to add this to wasmtime.
I think we could also maybe export Go functions into WASM and have Trealla call it via FFI to report results, but I'm not quite sure how that works and if it is possible yet.
I would be especially interested in switching to any WASM runtime with good Windows support. Then we can support all major platforms out of the box.
This looks promising as well: https://github.com/tetratelabs/wazero It's more in spirit with the library I think. Not sure if it's performance is anywhere close to the others. wazero struggles with my broken realpath implementation in TPL so I need to fix that and do some experimenting.
Sorry for "hijacking" this issue, though I do believe what I have observed is not entirely unrelated to the current state of wasmer-go
I have played around with the library, really like the concept, though I have ran into some issues with memory usage.
As an example, the code below demonstrates three separate issues:
nl
.int32
to slice into WASM-allocated memory, resulting in a panic:
$ go run main.go -- --with-interop
2023/04/20 14:59:30
2023/04/20 15:00:53 trealla: query error: trealla: panic: runtime error: slice bounds out of range [-2130246640:]
exit status 1
package main
import (
"context"
"fmt"
"log"
"math/rand"
"os"
"time"
"github.com/trealla-prolog/go/trealla"
)
func newEngine(ctx context.Context) (trealla.Prolog, error) {
engine, err := trealla.New()
if err != nil {
return nil, err
}
err = engine.Register(ctx, "invoke", 1, trealla.Predicate(func(pl trealla.Prolog, subquery trealla.Subquery, goal trealla.Term) trealla.Term {
v := fmt.Sprintf("%v", rand.Float64())
return trealla.Compound{
Functor: "invoke",
Args: []trealla.Term{
trealla.Atom(v),
},
}
}))
return engine, err
}
func main() {
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
engine, err := newEngine(ctx)
if err != nil {
log.Fatal(err)
}
log.Print()
defer func() {
recover()
log.Print()
os.Exit(2)
}()
getEngine := func() trealla.Prolog {
return engine
}
// perhaps ugly, but suffices for testing
query := `nl`
for _, arg := range os.Args[1:] {
switch arg {
case "--with-interop":
query = `invoke(A)`
case "--with-new-instance":
getEngine = func() trealla.Prolog {
engine, _ := newEngine(ctx)
return engine
}
}
}
for {
func() {
ctx, cancel := context.WithTimeout(ctx, 10*time.Second)
defer cancel()
engine := getEngine()
doWork(ctx, query, engine)
}()
}
}
func doWork(ctx context.Context, query string, engine trealla.Prolog) {
resultSet := engine.Query(ctx, query)
defer resultSet.Close()
for resultSet.Next(ctx) {
}
if err := resultSet.Err(); err != nil {
log.Println(err)
os.Exit(1)
}
}
Since the alternative is getting stuck forever, I'm not entirely sure the int32
issue should be fixed just yet - I would personally prefer crashing and restarting to never getting anywhere at all, for one. But I think, assuming I'm not wrong about the source of the leak when each query is served by a new instance, a different WASM runtime might be more beneficial than it seems at first glance.
Thanks @enoperm, that is very concerning. I’ll look into it 👀
Do remember to use a separate cgroup or something if you run the repro code with --with-new-instance
. It is a good way to test your OOM killer...
I think the leak around instantiating new Prologs is my fault, not wasmer's, seems like I forgot to add a Close to Prolog. I'll add that and let's see if it helps.
I think it would also not be too hard to move to wasmtime, Trealla has some stuff now to capture stdout/stderr so we can work around the wasmtime-go issue.
To be more specific, the wasmer stuff relies on runtime finalizers to free memory if they aren't explicitly closed so maybe a tight loop could prevent them from being called. Adding a Close to the Prolog interface should help with that.
It is possible that I may have been harsh towards wasmer, I suppose. I'm not an expert on the Go runtime, so when I dug into the code and saw that finalizers exist, I just assumed they'd be surely called by the GC before the kernel triggers the OOM killer, and then encountered the previously linked issue and it kind of fit: if the wasm runtime does not free up things anyway, then any finalizers are pointless.
I'm not too sure how Go's runtime handles them either, TBH, so no worries.
I tried adding a Close()
method to the Prolog instance that calls wasmer's instance.Close()
but it doesn't seem to help. I'll try and figure out whether it's a problem here or with wasmer-go. I'll also check if trealla-js has the same issue or not (I remember measuring it a while back and it was fine) which will rule out Trealla itself.
Thanks again for the test code, it is very helpful.
trealla-js looks fine so it shouldn't be a problem with Trealla itself. I think the issue here is we hold a global wasmer.Store
which probably just infinitely accumulates memory. So we'd need to instantiate a new module for each Prolog to avoid that, but then we run into the wasmer-go bug you linked. I'll see about using a different library.
BTW the interop bug with the negative pointers should be fixed in the next release, but it can still run out of memory if hits 4GB. Not sure if any of the Go libraries support 64-bit WASM but if they do it should be possible to fix that.
Good news, I got wasmtime working (in the wasmtime branch if you want to check it out). Running the test program with --with-new-instance
and closing the engine in the loop allows it to properly garbage collect the memory. Also, looks like it properly supports 64-bit Wasm so I'll play around with compiling Trealla to that this weekend.
Bad news, it seems like there's still a leak with queries against a single Prolog instance. Looks like this is happening on the Wasm side, probably from the stuff I added to get wasmtime to work ;). Should be able to fix it.
Also the Tak function benchmark seems like twice as slow on wasmtime vs wasmer, not sure what's up with that.
Found the leak! Got the default test stable at around ~64MB. Looks like there's some weirdness with the interop stuff still, so once that gets fixed we should be good and I'll push the new stuff.
I believe this is what was causing the interop memory issue: https://github.com/trealla-prolog/trealla/issues/162 Once we fix that everything should be solid.
More good news: we've fixed all the leaks. Just need to tweak the interop stuff to work better with wasmtime and I can make a new release.
I have plugged in the wasmtime
branch into the toy project where I initially detected the memory leak and I can confirm that it can now serve each request on a new Trealla instance just fine without any apparent leaks. :heavy_check_mark:
Great! I have some more changes I'm just about to push. Will test it a bit more and make a new release shortly.
Hmm, seems like wasmtime can crash sometimes. Probably my fault, investigating.
Whoops, was a concurrency issue, fixed now. Made an issue for wasm64 support as well: #8
Would
wasmtime
provide any advantage overwasmer
?