wasilibs / nottinygc

Higher-performance allocator for TinyGo WASI apps
MIT License
61 stars 6 forks source link

Question: Does the gc execute before the `proxywasm.DispatchHttpCall` callback executes ? #28

Closed sandeeppnaik closed 9 months ago

sandeeppnaik commented 10 months ago

Hi @anuraaga !

I am running into a strange behaviour when my wasm plugin executes a DispatchHttpCall. It appears the call to the service (dispatch call) is never made in majority of the runs, making me think if the gc clears the context before the execution of the dispatch call.

The flow works as expected when i don't run the custom gc Req -> parseHeaders -> dispatchCall(ServiceA) -> callback -> getResponseHeaders -> update headers -> ReqGoesUpstream

The metrics below measure the execution_count(execution of the plugin) & the success_count(completion of callback). Below are the numbers when I don't run the custom gc - execution_count = success_count.

image

Whereas when i run the custom gc, only a portion of requests end up making a call upstream (I verified that requests don't reach serviceA)

image

My main.go has these imports

import (
    "crypto/rand"
    "fmt"
    "math/big"
    "os"
    "strings"
    "time"

    "github.com/tetratelabs/proxy-wasm-go-sdk/proxywasm"
    "github.com/tetratelabs/proxy-wasm-go-sdk/proxywasm/types"
    "github.com/tidwall/gjson"
    _ "github.com/wasilibs/nottinygc"
    "golang.org/x/exp/slices"
)

And the file ends with this after all the functions

// required for nottinygc
//
//export sched_yield
func sched_yield() int32 {
    return 0
}

Appreciate your thoughts on this behaviour , and thank you for this package :)

anuraaga commented 10 months ago

Hi @sandeeppnaik - would it be possible to get a reproduction showing the issue? If there's a repo with a command to build the plugin and a docker compose command to start two envoys to show the problem, that would be very helpful. If not, I'll try to set it up but it may take some more time.

Nothing immediately comes to mind as a potential problem so will need to debug a repro. For the question in the title specifically, GC can happen with any allocation, and there are indeed allocations happening during that call. Though that should be ok in principle.

sandeeppnaik commented 9 months ago

Hey @anuraaga , I have added the plugin code on 2 branches

You will need a dummy server with an endpoint for the plugin to return success.

and there are indeed allocations happening during that call -> Does the location of the stub method in the code matter at all ?

anuraaga commented 9 months ago

Thanks for the report and repro @sandeeppnaik, I found an issue with the recent addition of precise GC. v0.6.0 includes the fix and trying it with your repro, it seems to be working well. Let me know if you still have any issues.

sandeeppnaik commented 9 months ago

Thanks for looking into this promptly ! I am running v.0.6.0 now. I can definitely see that all the dispatch calls are working as expected. Question about the garbage collection Here, the first bump beginning at 15:40 is without GC v/s the second bump at 16:02 with cusotm GC>

image

THe ramp up of mem when running custom GC is better than without it. However I would expect the second bump to go down over time and free up the memory, although not seeing that. Any idea why @anuraaga ?

anuraaga commented 9 months ago

@sandeeppnaik IIUC, this is a graph of real memory usage of the Envoy process. WebAssembly currently does not define any way to return memory to the OS, so any memory that has been allocated will stay allocated forever. Environment variables like GC_PRINT_STATS can provide information about how much of the allocated memory is actually in use by objects if that is what you are interested in

https://github.com/ivmai/bdwgc#installation-and-portability

and the hope is that when the process stabilizes it will efficiently use the memory it has already allocated from the OS. But unfortunately due to the fundamental limitation of wasm, there is no way or workaround to allow unused memory to ever be released completely.

sandeeppnaik commented 9 months ago

this is a graph of real memory usage of the Envoy process -> yep this was envoy's memory usage. Aah i understand, so the plugin never returns the memory back to the proxy but tends to use whatever is allocated. If that is the case, the memory usage of the plugin has to be consistent for a consistent load pattern (sustained throughput of x rps over a period of stability when there is no load between 2 consecutive load tests) right ? Although I see it is ever growing until it hits the proxy's limit mem.

image
{"level":"warn","time":"2023-10-23T05:47:12.025635Z","msg":"Envoy may have been out of memory killed. Check memory usage and limits."}
{"level":"error","time":"2023-10-23T05:47:12.025679Z","msg":"Envoy exited with error: signal: killed"}
{"level":"error","time":"2023-10-23T05:47:12.025715Z","msg":"error serving tap http server: http: Server closed"}
anuraaga commented 9 months ago

It would be good to confirm it is the memory within a VM going up and not the number of VMs itself, Envoy has metrics wasm.envoy.wasm.runtime.v8.active and wasm.envoy.wasm.runtime.v8.created to show the number of VMs. It could be an issue with GC but need to confirm the VMs, unfortunately there isn't a per-VM memory usage stat yet though.

sandeeppnaik commented 9 months ago

I noticed both of the above metrics stay constant throughout the load test while the memory spikes

curl http://localhost:15090/stats/prometheus | grep v8
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0# TYPE envoy_wasm_envoy_wasm_runtime_v8_created counter
envoy_wasm_envoy_wasm_runtime_v8_created{} 10
100 # TYPE envoy_wasm_envoy_wasm_runtime_v8_active gauge
9envoy_wasm_envoy_wasm_runtime_v8_active{} 12

Interestingly, in the previous version nottinygc v0.5.1 the gc does not spike. But not entirely sure if that is because it prevents the plugin from making the dispatch call or not

anuraaga commented 9 months ago

Thanks for confirming - I'm assuming the plugin you are testing against isn't the same as in https://github.com/sandeeppnaik/wasm-test/tree/main? FWIW, I can't reproduce increasing memory usage locally using that repo - it might be harder to debug issues without a closer reproduction in that case. Other things to try are running with GC_PRINT_STATS environment variable set and see if nottinygc v0.4.0 has any different behavior.

anuraaga commented 9 months ago

Hi @sandeeppnaik - I got another report with a potential fix

https://github.com/wasilibs/nottinygc/issues/31#issuecomment-1774710419

Perhaps if you get a chance you could try the https://github.com/wasilibs/nottinygc/commit/667d0d4f344b5c0b42b729820bf83af2e6876377 commit if it causes a difference. Thanks.

sandeeppnaik commented 9 months ago

Unfortunately, still seeing the same behaviour of mem spiking with this @anuraaga. My plugin is using pretty much the above example. May i know what throughput you are testing with ? When I run at about ~600rps, I can reproduce memory spikes.

image

Also tested with 0.40, it seems to have the same behaviour.

Tested this too-> github.com/higress-group/nottinygc v0.0.0-20231019105920-c4d985d443e1 seems to stop that mem leak. This seems to prevent the mem leak !

anuraaga commented 9 months ago

Thanks for the confirmation - it seems like the issue with malloc affects callbacks more strongly than other apps. I will go ahead and incorporate that in soon

anuraaga commented 9 months ago

34 has a new build tag for Envoy plugins, would you be able to try the f38ad42f4b892d75a9986b7f201db13300cedcdf commit and the build tag nottinygc_proxywasm (also can remove sched_yield stub)?

sandeeppnaik commented 9 months ago

Getting a panic while compiling the binary tinygo build -o main.wasm -gc=custom -tags=nottinygc_proxywasm -scheduler=none -target=wasi main.go

panic: cannot convert pointer value to byte

goroutine 197 [running]:
github.com/tinygo-org/tinygo/interp.rawValue.bytes(...)
        /Users/runner/work/tinygo/tinygo/interp/memory.go:743
github.com/tinygo-org/tinygo/interp.rawValue.Uint(...)
        /Users/runner/work/tinygo/tinygo/interp/memory.go:751
github.com/tinygo-org/tinygo/interp.(*runner).run(0xc00910dd60, 0xc009f2c640, {0xc008e01690, 0x4, 0x0?}, 0xc009b68720, {0xc006a3b3c0, 0x1c})
        /Users/runner/work/tinygo/tinygo/interp/interpreter.go:828 +0x21f6
github.com/tinygo-org/tinygo/interp.(*runner).run(0xc00910dd60, 0xc009f2d4f0, {0xc008e00b10, 0x4, 0x0?}, 0xc009b68360, {0xc007114900, 0x18})
        /Users/runner/work/tinygo/tinygo/interp/interpreter.go:526 +0x7ccc
github.com/tinygo-org/tinygo/interp.(*runner).run(0xc00910dd60, 0xc009f2d4a0, {0xc005d70f90, 0x1, 0x0?}, 0xc009b55f20, {0xc0071148a0, 0x14})
        /Users/runner/work/tinygo/tinygo/interp/interpreter.go:526 +0x7ccc
github.com/tinygo-org/tinygo/interp.(*runner).run(0xc00910dd60, 0xc009f2d450, {0xc0047f8e10, 0x2, 0x0?}, 0xc009b55e90, {0xc006cd4ac0, 0x10})
        /Users/runner/work/tinygo/tinygo/interp/interpreter.go:526 +0x7ccc
github.com/tinygo-org/tinygo/interp.(*runner).run(0xc00910dd60, 0xc009f2d400, {0xc008d9b390, 0x2, 0x0?}, 0xc009b55b30, {0xc006cd47a0, 0xc})
        /Users/runner/work/tinygo/tinygo/interp/interpreter.go:526 +0x7ccc
github.com/tinygo-org/tinygo/interp.(*runner).run(0xc00910dd60, 0xc009f2d3b0, {0xc008d9a290, 0x1, 0x0?}, 0xc009b554d0, {0xc006c4b2d8, 0x8})
        /Users/runner/work/tinygo/tinygo/interp/interpreter.go:526 +0x7ccc
github.com/tinygo-org/tinygo/interp.(*runner).run(0xc00910dd60, 0xc009f2d360, {0x0, 0x0, 0x100066faa?}, 0x0, {0x104a4b0a0, 0x4})
        /Users/runner/work/tinygo/tinygo/interp/interpreter.go:526 +0x7ccc
github.com/tinygo-org/tinygo/interp.Run({0x110f05240?}, 0x29e8d60800?, 0x0?)
        /Users/runner/work/tinygo/tinygo/interp/interp.go:117 +0x67e
github.com/tinygo-org/tinygo/builder.optimizeProgram({0x110f04080?}, 0xc000024240)
        /Users/runner/work/tinygo/tinygo/builder/build.go:1046 +0x35
github.com/tinygo-org/tinygo/builder.Build.func5(0xc00907d9e0)
        /Users/runner/work/tinygo/tinygo/builder/build.go:576 +0x61d
github.com/tinygo-org/tinygo/builder.runJob(0xc00907da40, 0x0?)
        /Users/runner/work/tinygo/tinygo/builder/jobs.go:222 +0x4d
created by github.com/tinygo-org/tinygo/builder.runJobs in goroutine 1
        /Users/runner/work/tinygo/tinygo/builder/jobs.go:123 +0x5bd
anuraaga commented 9 months ago

@sandeeppnaik I'm not sure if it would cause that crash but the invocation seems to be missing the custommalloc build tag, also good to confirm you've removed the sched_yield function from your own code.

sandeeppnaik commented 9 months ago

Thanks for writign back @anuraaga . My bad, added this tinygo build -o main.wasm -gc=custom -scheduler=none -target=wasi -tags="custommalloc nottinygc_proxywasm" main.go.

Works like a charm ! Confirming there is no memory leak at high load, and I am not implementig the stub method 👍

anuraaga commented 9 months ago

@sandeeppnaik I have released a new version with the changes. Warning that I ended up naming the tag nottinygc_envoy in the end. Hopefully this works fine now and thanks for the patience.

sandeeppnaik commented 9 months ago

Hey @anuraaga , noticing a behaviour where I am incrementing a counter metrics just prior to returning from the call back / ResumeHttpRequest where I see those metrics never incrementing. Is this a situation of very eager garbage collection ? I dont see the same behaviour with the higress version. Any thoughts ?

anuraaga commented 9 months ago

We generally wouldn't expect GC to have an effect since there are no allocations when incrementing a metric.

https://github.com/tetratelabs/proxy-wasm-go-sdk/blob/main/proxywasm/hostcall.go#L759

But you never know, would need a repro to debug it.

anuraaga commented 9 months ago

I have added an e2e test that is based off of your wasm-test

https://github.com/wasilibs/nottinygc/tree/main/e2e/envoy-dispatch-call https://github.com/wasilibs/nottinygc/blob/main/magefiles/e2e.go#L68

Checking counters after a minute of running they seem to be stable, but perhaps you're seeing issues after longer runtime or maybe there are some differences with the test case? If you have any suggestions for that e2e test, a PR would be great. Thanks.