V8 values returned by v8go are not garbage collected by V8

arctica commented 3 years ago

Using the print function from the examples (even with the actual fmt.Printf() commented out) seems to result in a memory leak.

Slightly modified example from the Readme to call the print function in an infinite loop:

iso, _ := v8go.NewIsolate() // create a new VM
// a template that represents a JS function
printfn, _ := v8go.NewFunctionTemplate(iso, func(info *v8go.FunctionCallbackInfo) *v8go.Value {
    //fmt.Printf("%v", info.Args()) // when the JS function is called this Go callback will execute
    return nil // you can return a value back to the JS caller if required
})
global, _ := v8go.NewObjectTemplate(iso) // a template that represents a JS Object
global.Set("print", printfn) // sets the "print" property of the Object to our function
ctx, _ := v8go.NewContext(iso, global) // new Context with the global Object set to our object template
ctx.RunScript("while (true) { print('foo') }", "print.js") // will execute the Go callback with a single argunent 'foo'

Warning: Running this will quickly consume all memory of the machine.

Removing the 'foo' parameter of the print() call seems to stop the leak. So I guess the parameters of the callback are somehow leaking.

arctica commented 3 years ago

I think v8go.NewValue() also leaks memory, even if you don't pass it back as a result from the Go callback.

iso, _ := v8go.NewIsolate()
fn, _ := v8go.NewFunctionTemplate(iso, func(info *v8go.FunctionCallbackInfo) *v8go.Value {
    v8go.NewValue(iso, "foo")
    return nil
})

global, _ := v8go.NewObjectTemplate(iso)
global.Set("fn", fn)
ctx, _ := v8go.NewContext(iso, global)
ctx.RunScript("while (true) { fn(); }", "fn.js")

rogchap commented 3 years ago

New Values are attached to the Isolate that created them, you need to Dispose the isolate to free any value's memory. For the Function the Values should be attached to the Context that created them, so you need to call ctx.Close() to free the memory.

Maybe we need an explicit call to free the values, if you want to free the memory without closing or disposing of the isolate or context? 🤔

arctica commented 3 years ago

It's impractical to free their memory only when the isolate or context is closed. Not every VM is short lived. In my case I have a bunch of very long running isolates/contexts that are fed new values from Go as events arrive.

I see two ways to do this:

Copy the values somehow into v8 and let it manage those with its GC.
Manually free the memory of the Value in Go as you suggested but this can be tricky as one has to be careful about the lifetime of Values and might even lead to use-after-free security vulnerabilities from inside the JS VM if not done correctly. Plus Go and JS being garbage collected makes manual memory management feel out of place.

Maybe it's worth checking how nodejs handles this?

rogchap commented 3 years ago

We previously had finalizers set for values so they could be GC'd by Go and call C.free(), but there were issues where the GC would not GC the values because it was unaware of the memory allocated to the Value's value on the C side (the value struct on the Go side only holds pointers so is relatively small), and may also try to GC after the context was closed, therefore trying to free values that have already been freed. There is no way (that I know) of syncing the Go GC with V8's GC

The core problem is actually how CGO works, and the only way to hold a reference to a value "across the bridge" is to have a persistent Value and use a struct to hold that pointer; a persistent value needs to be manually freed. Yes Go has GC and V8 has it's own GC for JS objects, but C/C++ does not.

Nodejs does not have this issue because Nodejs is built in C++ and no bridge to cross aka CGO; it does not need to create persistent values and can deal with just local scoped values.

Rust has better semantics to interface with C libraries and share memory space as it also is not a GC language and is closer to C++; hence why Deno was built with Rust and dropped the initial implementation that was written in Go.

TLDR; when dealing with CGO, it can be unavoidable to to have to manually free objects in some way.

arctica commented 3 years ago

I've started reading the v8 API a bit and about Values, Local vs Persistent handles etc.

What I don't understand yet is why calling Reset() in the finalizer of a Value from the Go side shouldn't do the trick. With Reset() we just tell v8 we are not interested in it anymore from Go. As long as a Persistent handle is held in Go, v8's GC wont free it and as long as a reference to the Value exists in Go, Go's GC wont call the finalizer which would Reset() the value. It sounds a bit strange that the Go GC wouldn't collect Values "because it was unaware of the memory allocated to the Value's value on the C side". The Go GC ignores C pointers so the only issue can be that it collects a Go Value while on the C side things survive but that's why there are finalizers. Your point about the Go GC freeing a Value after the Context was already closed though makes sense since the order of garbage collection can't be predicted. But the simple solution for that would be to just check if the context was closed. Or you could keep a reference to the Context inside the Value to ensure the Context is not garbage collected if any Values are still alive. I think the issue arises due to a mismatch of manual memory management for Contexts (.Close()) and automatic memory management for Values. The two are closely connected in usage but require different handling. Actually it seems that handles in v8 are tied to Isolates and not Contexts.

I don't think it has much to do with how cgo works or C vs C++ vs Rust. Node written in C++ also needs to use Persistent handles if the references are to survive the local scope. And v8go could make use of Local handles for example when passing arguments from Go to a JS function.

I also see a bit of what looks like unnecessary copying of data. Take for example Value.String() which calls C.ValueToString() which will malloc a copy of the string (just to free it shortly after) then calls C.GoString() which in turn again copies the string. Here one could get the char* from the v8::String::Utf8Value and call C.GoString() on that, skipping that intermediary copy.

I could be wrong with any of the above because I'm just getting into this codebase as well as v8's so please correct me in case I'm misunderstanding things.

genevieve commented 3 years ago

Hey @arctica. Since you have a reproduction, have you been able to test the changes you've proposed to v8go to see if any resolve the issue?

arctica commented 3 years ago

Hey @arctica. Since you have a reproduction, have you been able to test the changes you've proposed to v8go to see if any resolve the issue?

Hi @genevieve. Yes I have indeed. First I did a few changes but then realized I wanted to make quite a lot of fundamental changes and pretty much did a rewrite. My version is fully garbage collected (no need to manually .close() a context) via the Go GC. I deviated from v8go's goal to replicate the V8 API and went for something with simpler surface area but easy to understand and use with less pitfalls. I don't think it's in a state to be published though. Maybe if there's some interest...

genevieve commented 3 years ago

@arctica Yes! We'd be interested in seeing your implementation if you're able to publish it.

dylanahsmith commented 3 years ago

I don't think it's in a state to be published though.

You can open it as a draft PR to indicate that it isn't quite ready as well as letting us know what you still think is needed. Let us know if you want some help with getting it ready for merging

arctica commented 3 years ago

@genevieve @dylanahsmith Alright, I'll see what I can do over the coming days. Will post again once something is up. Probably not before next week though as I have a few other things to do.

rogchap commented 3 years ago

Hey Everyone; just a quick update; I've been thinking long and hard about this problem, and I've got a few ideas on how to solve this. I'll be testing out my theory and post any updates here. 🤞

rogchap commented 3 years ago

Update:

I've changed the tracked values from a vector to an unordered_set this way we can check to see if the value has been previously freed or not (by erasing from the set on freeing).

I've added back a finalizer for v8go.Value (just for the string value at the moment) and created a very simple http server to monitor memory allocations:

var mx sync.Mutex
var iso *v8go.Isolate

func handler(w http.ResponseWriter, r *http.Request) {
    mx.Lock()
    for i := 0; i < 10000; i++ {
        val, _ := v8go.NewValue(iso, "foobar")
        if i%1000 == 0 {
            fmt.Fprintf(w, "Value: %s-%d\n", val, i)
        }
    }
    mx.Unlock()
}

func main() {
    iso, _ = v8go.NewIsolate()

    http.HandleFunc("/", handler)
    log.Fatal(http.ListenAndServe(":8080", nil))
}

Please don't judge the code above; it's not the best!

With the server running, I fired some requests:

ab -n 500 -c 5 http://localhost:8080/

And monitored the allocations using XCode Instruments:

From the above test, we can see the memory being allocated (very quickly) but also the Go GC kicking in and the finalizer being run, thus freeing the memory.

This is what we want; however the story does not end there: when I run the test suite, I get a lot of errors (segmentation violation) from the finalizer.

So although the above "solution" looks promising, I need to dig deeper to what's causing the panic. (need to re-build v8 in debug).

Will keep going and see where I get.

rogchap commented 3 years ago

So this is the memory profile if we remove the value finalizer and create a new Isolate per request and then Dispose:

func handler(w http.ResponseWriter, r *http.Request) {
    iso, _ := v8go.NewIsolate()
    for i := 0; i < 10000; i++ {
        val, _ := v8go.NewValue(iso, "foobar")
        if i%1000 == 0 {
            fmt.Fprintf(w, "Value: %s-%d\n", val, i)
        }
    }
    iso.Dispose()
}

func main() {
    http.HandleFunc("/", handler)
    log.Fatal(http.ListenAndServe(":8080", nil))
}

As you can see the memory profile is WAY better (at the cost of having to create a new Isolate), only peaking at 12MB rather than the previous 200MB (and climbing).

So we really want to support both models; but consider the following (assuming we have the value finalizer back in):

iso, _ := v8go.NewIsolate()
val, _ := v8go.NewValue(iso, "some string")
fmt.Println(val.String())
iso.Dispose()

runtime.GC()
time.Sleep(time.Second)

When we call iso.Dispose() all of the isolates resources get de-allocated by V8, so when the Go GC runs, it calls the value finalizer, which then tries to free objects that have already been freed; ie. panics.

So you may think the "easy" solution is maintain a flag on when the Isolate is disposed; well, we already have this as the internal Isolate C pointer is set to nil when we dispose, so if we attach the isolate to every value we could do a check for this something like:

func (i *Isolate) Dispose() {
    if i.ptr == nil {
        return
    }
    C.IsolateDispose(i.ptr)
    i.ptr = nil
}

func (v *Value) finalizer() {
    if v.iso.ptr != nil {
        C.ValueFree(v.ptr)
    }
    v.ptr = nil
    runtime.SetFinalizer(v, nil)
}

The tricky problem here is that we are not in control of when the GC and the value finalizers run. It's possible (and you see this a lot when running the above server with the value finalizer) that C.ValueFree() can still be called before the C.IsolateDispose() function returns.

iptr := i.ptr
i.ptr = nil
C.IsolateDispose(iptr)

☝️ This helps to reduce the panics, but they still happen; I guess that values are being GC'd before the Dispose() and when we Dispose() the timing aligns where a C.ValueFree still gets called after the Dispose()

rogchap commented 3 years ago

This is a "fully" GCd setup, where there is no iso.Dispose() and we do this on a Isolate finalizer; the server is the same as before, creating a new Isolate per request.

The memory keeps climbing (goes up to 750MB here) until all the requests are finished, then does GC some of the Isolates (not many though). The benefit is that the Dispose only gets called after all the values are freed first; but the Go GC doesn't call finalizer on everything as it does think it needs to. If you had constant request,s you would soon get a "out of memory" exception. I currently can't see how a "fully GC'd" solution is viable at this point.

dylanahsmith commented 3 years ago

Fundamentally, the GC can't be designed to manage memory growth for a heap that it doesn't know about. Otherwise, it won't know that when that memory it doesn't manage is at a threshold where garbage collection should be triggered to avoid that memory growing.

For Go to know about the memory used by C/C++ code, I think it would have to use the C.malloc and C.free functions provided by cgo. Although v8 has support for overriding the array buffer allocator, I don't see support for overriding allocations in general so that Go would be aware of the memory allocations. It is the same problem with other languages, for example ruby also has an external malloc problem when interacting with libraries that use malloc instead of the allocation functions that ruby provides.

So even if finalizers were set to automatically release v8 values, we would avoid using them in favour of a manual method for releasing the V8 values in order to avoid this memory growth problem.

There is still a danger in forgetting to release memory by relying on manual memory management. Ideally, that could be caught from tests, however, I think something needs to be added to support this. For instance, if we could get the count of v8go.Values that weren't released, then we could check that before and after some code to make sure it isn't leaking values, possibly with an initial run of that code if it is expected to lazily initialize something once. Alternatively, maybe this could be done using Isolate.GetHeapStatistics() if a method were exposed to force full garbage collection.

arctica commented 3 years ago

Fundamentally, the GC can't be designed to manage memory growth for a heap that it doesn't know about. Otherwise, it won't know that when that memory it doesn't manage is at a threshold where garbage collection should be triggered to avoid that memory growing.

The first sentence is key. And the conclusions should be to not attempt to make the Go GC manage the V8 memory. Let the V8 GC manage its stuff and the Go GC the objects on its side of the fence. But that requires careful consideration on what points to where and to "orchestrate" the GCs a bit.

For Go to know about the memory used by C/C++ code, I think it would have to use the C.malloc and C.free functions provided by cgo. Although v8 has support for overriding the array buffer allocator, I don't see support for overriding allocations in general so that Go would be aware of the memory allocations.

Go would then know about the allocations but Go's GC still wouldn't be able to make any use of it because it ignores cgo heap. So if all one would take from this is getting stats about how much memory has been allocated in V8 in order to manually trigger then Go GC then there are easier functions available.

So even if finalizers were set to automatically release v8 values, we would avoid using them in favour of a manual method for releasing the V8 values in order to avoid this memory growth problem.

Manual memory management can be a reasonable choice for performance reasons. It will lower the amount of allocated memory and reduce the amount of CPU consumed to run a GC. Both strategies can be done at the same time but one has to be very careful about race conditions especially because in Go as soon as finalizers are introduced one is in multithreaded territory since the GC runs on other threads.

There is still a danger in forgetting to release memory by relying on manual memory management. Ideally, that could be caught from tests, however, I think something needs to be added to support this. For instance, if we could get the count of v8go.Values that weren't released, then we could check that before and after some code to make sure it isn't leaking values, possibly with an initial run of that code if it is expected to lazily initialize something once. Alternatively, maybe this could be done using Isolate.GetHeapStatistics() if a method were exposed to force full garbage collection.

If all one wanted to do is catch leaks with manually managed memory then the solution is quite simple: put finalizers on the Values which just panic so you get a stacktrace and remove the finalizer when the Value is Free()'d. That way you are using the Go GC as a leak detector for pretty much free since the finalizer would only run if there's no reference to the object in Go anymore but it was not properly freed.

arctica commented 3 years ago

This is what we want; however the story does not end there: when I run the test suite, I get a lot of errors (segmentation violation) from the finalizer.

So although the above "solution" looks promising, I need to dig deeper to what's causing the panic. (need to re-build v8 in debug).

That is because as soon as finalizers are brought into the equation, one is in multithreaded territory and has to employ locking to prevent race conditions.

This helps to reduce the panics, but they still happen; I guess that values are being GC'd before the Dispose() and when we Dispose() the timing aligns where a C.ValueFree still gets called after the Dispose()

Yes and I don't believe it is possible to make the code correct without proper locking. You can't even assume that checking or setting a pointer value like i.ptr is threadsafe. You'd need to use the functions from sync/atomic but even then one has to be very careful and not everything is possible with just atomics.

The memory keeps climbing (goes up to 750MB here) until all the requests are finished, then does GC some of the Isolates (not many though). The benefit is that the Dispose only gets called after all the values are freed first; but the Go GC doesn't call finalizer on everything as it does think it needs to. If you had constant request,s you would soon get a "out of memory" exception. I currently can't see how a "fully GC'd" solution is viable at this point.

If the isolate objects are unreachable then the Go GC will run the finalizers. All of them, 100% guaranteed. The problem though is that the GC might not run until quite some time later. Especially if there are no new allocations happening as in your example whith a finite amount of http requests. Then it says "What's the point anyways? Nobody is asking for more memory right now so why free it?". It will periodically (iirc every 2 minutes) run anyways so if you wait at the end you should eventually see all Isolates being disposed. Or alternateively you can just add a runtime.GC() call. But if one already knows that the Isolates are not needed anymore at this point in time then yes manually disposing them can be quite an advange.

Running with garbage collection is a tradeof. One trades CPU usage and especially memory usage for convenience. In the example of a webserver where at the end of the request one can just throw all the JS resource away because it was used just for some temporary computations then manually disposing the Isolate (or a Context) can have big benefits. Some usage scenarios will benefit more, some less.

dylanahsmith commented 3 years ago

Go would then know about the allocations but Go's GC still wouldn't be able to make any use of it because it ignores cgo heap. So if all one would take from this is getting stats about how much memory has been allocated in V8 in order to manually trigger then Go GC then there are easier functions available.

As you say, go would make use of it for stats, but the point is that Go would trigger the Go GC manually (i.e. on-demand), rather than requiring another hook to do this on-demand before the v8 heap is grown. It isn't that Go would free things in the cgo heap directly, but it would be freed indirectly through the finalizers.

I'm not sure what you are referring to by there being "easier functions available". If there are appropriate hooks available in v8 to trigger the Go GC before it increases the v8 heap size, then that would be interesting, since I didn't notice anything. Something similar could be done in the Go application code by using runtime.GC() but it wouldn't really be on-demand so wouldn't be as effective at keeping the heap from growing.

If all one wanted to do is catch leaks with manually managed memory then the solution is quite simple: put finalizers on the Values which just panic so you get a stacktrace and remove the finalizer when the Value is Free()'d.

Yes, this would be a way v8go could add support for checking this manual memory management, if it weren't using the finalizers to automatically free memory. Although, the test code would still need to use runtime.GC() after each of these tests to trigger the garbage collection to both ensure that it happens and to have the failure occur in the test that caused the failure.

arctica commented 3 years ago

I'm not sure what you are referring to by there being "easier functions available". If there are appropriate hooks available in v8 to trigger the Go GC before it increases the v8 heap size, then that would be interesting, since I didn't notice anything. Something similar could be done in the Go application code by using runtime.GC() but it wouldn't really be on-demand so wouldn't be as effective at keeping the heap from growing.

I'm not sure if there is a hook available for when the heap is grown (one could always create one via a wrapper around the default allocator as you mentioned before) but what I ment is that one can use Isolate::GetHeapStatistics() or similar to get the heap numbers in order to decide if a manual GC is in order instead of tracking every single allocation. You can call this method periodically or in one of the GC callbacks. What I am doing instead though is I've added a pre-GC hook in V8 which calls into Go to run there runtime.GC(). The idea being that the V8 GC will (hopefully) be triggered if the V8 heap grows sufficiently and by pre-empting it and running the Go GC, we can trigger the finalizers and free as many V8 objects as possible.

Yes, this would be a way v8go could add support for checking this manual memory management, if it weren't using the finalizers to automatically free memory. Although, the test code would still need to use runtime.GC() after each of these tests to trigger the garbage collection to both ensure that it happens and to have the failure occur in the test that caused the failure.

Maybe one could allow the library to take a callback that is called whenever something is freed via finalizer. A test could then fail. In production one could log it or track metrics etc. Or one could just not specify the callback and rely on automatic memory management.

dylanahsmith commented 3 years ago

What I am doing instead though is I've added a pre-GC hook in V8 which calls into Go to run there runtime.GC(). The idea being that the V8 GC will (hopefully) be triggered if the V8 heap grows sufficiently and by pre-empting it and running the Go GC, we can trigger the finalizers and free as many V8 objects as possible.

Looks like you are referring to v8::Isolate::AddGCPrologueCallback. That does sound like it could work well to manage the heap size, although it does bring the risk of performance problems from triggering the Go GC too often if you are using multiple V8 isolates. For example, if you are using multiple V8 Isolates and they all get close to reaching a garbage collection threshold, then the Go GC could get triggered once for each V8 isolate. It seems like this could lead to scalability issues. Some trade-offs could be made to run the Go GC less frequently for performance at the cost of higher memory usage from less effective V8 garbage collections, but that seems more like it would be shifting overhead rather than getting rid of it.

I still think it makes sense to support manually calling a method on v8go.Value objects to release/untrack the underlying v8::Value.

arctica commented 3 years ago

Looks like you are referring to v8::Isolate::AddGCPrologueCallback. That does sound like it could work well to manage the heap size, although it does bring the risk of performance problems from triggering the Go GC too often if you are using multiple V8 isolates. For example, if you are using multiple V8 Isolates and they all get close to reaching a garbage collection threshold, then the Go GC could get triggered once for each V8 isolate. It seems like this could lead to scalability issues. Some trade-offs could be made to run the Go GC less frequently for performance at the cost of higher memory usage from less effective V8 garbage collections, but that seems more like it would be shifting overhead rather than getting rid of it.

Yes that's the method I am using. You are correct that using several Isolates that each can trigger their GC would then result in the host Go app running its own GC every time which can be an issue if the host app has a big heap. But I'm not simply calling runtime.GC() each time the V8 GC callback is invoked because in my testing those callbacks get called very frequently and are actually not that useful because there are many phases and not just one big garbage collection pass. In the callback I am grabbing the current heap statistics and track them for all active Isolates. That allows for the implementation of an algorithm similar to what Go itself is using where it simply does a big collection once the heap has increased a certain percentage. I implemented three trigger conditions: 1. A certain amount of time passed since the last GC (e.g. 10s) 2. The total heap size changed a certain percentage (e.g grew to 2x) 3. The JS heap limit is about to be reached for a given Isolate.

I still think it makes sense to support manually calling a method on v8go.Value objects to release/untrack the underlying v8::Value.

I fully agree with that. It's best to have the choice between manual and automatic memory management. Heck, I wish it was possible to manually free Go objects :) But Go being a garbage collected language I think that having automatic memory management is a must. Manual memory management should be a bonus that can be used for performance optimization. In my tests, using manual memory management can result in vastly reduced memory usage when creating tons of Isolates or Contexts because these are quite heavy. On the other hand when just allocating Values inside a few long lived Contexts then the garbage collector while using more memory was about 20% faster. I always depends on the real workload.

rogchap commented 3 years ago

Thanks @arctica and @dylanahsmith for your detailed thoughts.

It seems to me that we are in agreement to have some sort of "free" method.

What I thought might be good would be a iso.Prune() that would free all the values that are being tracked against that isolate.

It would basically do the same thing as Dispose() does, but without actually disposing of the isolate so that it can be re-used.

I see this method being called just before you place an Isolate back into a sync.Pool for example.

We could do the same for ctx.Prune() if you need to re-use Contexts. Although I think calling ctx.Close() and creating a new Context better, so that you're not leaking state.

If we had Prune() (let me know if there is a better name) do you think we still need val.Free() on individual values? Is that too fine grained?

I'll probably mark these methods as "Experimental" and subject to change for now.

rogchap commented 3 years ago

This is my first test case; creates a single Isolate that is reused and calls iso.Prune() at the end of each request (obviously a bit crap as this reduces concurrency to 1, but still a good test):

var mx sync.Mutex
var gblIsolate *v8go.Isolate

func init() {
    gblIsolate, _ = v8go.NewIsolate()
}

func handler(w http.ResponseWriter, r *http.Request) {
    mx.Lock()
    iso := gblIsolate
    defer func() {
        iso.Prune()
        mx.Unlock()
    }()

    for i := 0; i < 10000; i++ {
        val, _ := v8go.NewValue(iso, "foobar")
        if i%1000 == 0 {
            fmt.Fprintf(w, "Value: %s-%d\n", val, i)
        }
    }
}

func main() {
    http.HandleFunc("/", handler)
    log.Fatal(http.ListenAndServe(":8080", nil))
}

Memory profile looks good.

Second test case is using sync.Pool which still uses the Go GC to free unused Isolates so we create our own finalizer for this to call Dispose():

func finalizer(iso *v8go.Isolate) {
    iso.Dispose()
    runtime.SetFinalizer(iso, nil)
}

var ipFree = sync.Pool{
    New: func() interface{} {
        iso, _ := v8go.NewIsolate()
        runtime.SetFinalizer(iso, finalizer)
        return iso
    },
}

func isoFree(iso *v8go.Isolate) {
    iso.Prune()
    ipFree.Put(iso)
}

func handler(w http.ResponseWriter, r *http.Request) {
    iso := ipFree.Get().(*v8go.Isolate)
    defer isoFree(iso)

    for i := 0; i < 10000; i++ {
        val, _ := v8go.NewValue(iso, "foobar")
        if i%1000 == 0 {
            fmt.Fprintf(w, "Value: %s-%d\n", val, i)
        }
    }
}

func main() {
    http.HandleFunc("/", handler)
    log.Fatal(http.ListenAndServe(":8080", nil))
}

Memory profile for this looks good too.

There are tradeoffs here, as sync.Pool still creates a lot of new Isolates, but does also cleans the pool during GC, so depends on what you want to benchmark against. What I like about this setup, is that it's left to the caller to implement the finalizer rather than the v8go package.

I think having the Prune() methods would allow you to create your own Pool that used X number of Isolates, for example.

dylanahsmith commented 3 years ago

What I thought might be good would be a iso.Prune() that would free all the values that are being tracked against that isolate.

Doesn't that prevent the ability to hold references to v8 values.

One use case we have for holding onto references is as a part of initializing the isolates global context by setting values on it, but holding onto those values so they can be later used without having to get them back out of the context. This is beneficial for performance, since it is faster in Go to just access a struct field than to get a value from the v8 context. It also avoids the possibility of errors being introduced, such as from the JS code setting those initial values to something else.

Another use case we have for holding onto values is to be able to use an isolate for multiple tasks concurrently. Your approach with using an isolate pool would end up requiring a lot more isolates, and thus a lot more memory usage, to have isolates that are blocking on IO. Instead, we have a goroutine that owns the isolate and pulls tasks from a channel to start working on them, so it can start working on multiple tasks concurrently. Multiple isolates can then be added for parallelism, where their goroutines can pull new work from the same shared channel but have separate channels to handle IO completion. With this approach, the isolate could be kept busy with new tasks and not have a natural time to use iso.Prune(), which will require a similar workaround as we currently have where the goroutine essentially needs a "graceful restart" (forcing it to wait for existing tasks to fully complete) to free up memory.

As a result, an Isolate.Prune() method wouldn't be a replacement for a method like v8go.Value.Release().

arctica commented 3 years ago

Doesn't that prevent the ability to hold references to v8 values.

If implemented correctly, then it would only prevent one from using (not holding) references to V8 values beyond the call to prune. You would still be able to use them before that and then can free them all at once. Sometimes this might be favorable.

As a result, an Isolate.Prune() method wouldn't be a replacement for a method like v8go.Value.Release()

Definitely.

katallaxie commented 3 years ago

Your approach with using an isolate pool would end up requiring a lot more isolates, and thus a lot more memory usage, to have isolates that are blocking on IO.

@rogchap I think the v8go.Value.Release() is eventually a needed addition. I have a simple server running in the isolate that answers to simple events. This is a long-running process which leaks memory because the returned strings are bridged to the Go context. Or is there another to solution to this execution model?

rogchap commented 3 years ago

Apologies, I've left this one hanging... a lot of effort has gone in to find an optimal solution, but I've been unsatisfied with all my attempts.

I'll re-visit... and yes, I think it will have to be a value.Release() type function, but may add some other functions that are helpful depending on what you are trying to achieve.

I'll like mark them as "Experimental" and subject to change.

dylanahsmith commented 3 years ago

The approach I'm looking into is to more closely match c++ API by:

turning Value into an alias to LocalValue
using a Isolate.HandleScope(func()) to explicitly enter a top-level handle scope, so local values can be batch freed at the end of it (this can also use the v8::Locker, reducing locking overhead for nested v8go calls)
implicitly introducing an v8::EscapableHandleScope around calls to the FunctionTemplate callback
adding Value.Persist() to create a PersistentValue that escapes the handle scope stack and can be freed with PersistentValue.Release()

Internally, I want to

track values on the isolate instead of the context (aside from its Context.Global)
store an integer index in Go for values instead of a pointer, so they can be checked to preserve memory safety as well as to reduce heap allocation
keep a separate vector for Persistent values
use a c++ bitset to mark free items in the persistent value vector, along with using a union for the persistent value vector to have it double as a free list for freed elements
use v8::Locker in Isolate.HandleScope to reduce nested locking overhead
reduce overhead for calling Go from C++ (e.g. in Isolate.HandleScope) by changing ctxRegistry into a slice (i.e. cheaper lookup) of isolates pointers that are registered and given an index on creation (i.e. less exclusive locking). The callback data can then be retrieved from fields on the Isolate (e.g. currentContext and handleScopeCallback)

I think the handle scope should make this convenient for the common case of working with local values. However, PersistentValue's manually memory management seems inconvenient, we could provide an alternative method to persist the local value, which wraps the PersistentValue in a heap allocated struct that gets auto-released through a GC finalizer. E.g. this could be obtained using a Value.GCed() method. Since the Go finalizers run in a separate goroutine, they would be queued to a slice on the Isolate to release later (e.g. at the start of Isolate.HandleScope), using a RWMutex to synchronize the use of this slice. The freeing could even be done immediately if the Isolate isn't in a handle scope, if the RWMutex used to synchronize setting the handleScopeCallback.

We could also add GC finalizers for Context and Isolate. For Context we can use the above approach of queuing the Close on the isolate. For Isolate, we should make sure any values that depend on it contain a reference to it, since finalizers are run in dependency order we can just use iso.Dispose() in the finalize.

This seems like the best place to give an overview of the changes I would like to make, since I would break the changes into multiple PRs and want someplace to refer back to. Of course, I would also love feedback on my proposed approach to solving this issue.

arctica commented 2 years ago

The approach I'm looking into is to more closely match c++ API by:

* turning `Value` into an alias to `LocalValue`

* using a `Isolate.HandleScope(func())` to explicitly enter a top-level handle scope, so local values can be batch freed at the end of it (this can also use the v8::Locker, reducing locking overhead for nested v8go calls)

* implicitly introducing an `v8::EscapableHandleScope` around calls to the FunctionTemplate callback

* adding `Value.Persist()` to create a PersistentValue that escapes the handle scope stack and can be freed with `PersistentValue.Release()`

I think this makes a lot of sense given the goal of v8go to stay close to the V8 API. Performance-wise, I suspect the HandleScope() function could provide significant gains because one can runtime.LockOSThread() and as you mentioned v8::Locker right at the start and only once for a whole bunch of code.

I think the handle scope should make this convenient for the common case of working with local values. However, PersistentValue's manually memory management seems inconvenient, we could provide an alternative method to persist the local value, which wraps the PersistentValue in a heap allocated struct that gets auto-released through a GC finalizer. E.g. this could be obtained using a Value.GCed() method. Since the Go finalizers run in a separate goroutine, they would be queued to a slice on the Isolate to release later (e.g. at the start of Isolate.HandleScope), using a RWMutex to synchronize the use of this slice. The freeing could even be done immediately if the Isolate isn't in a handle scope, if the RWMutex used to synchronize setting the handleScopeCallback.

I think the key points are: 1. Finalizers can't free Values if a HandleScope is active since they run on a different goroutine and 2. they don't have to be free'd by the main isolate goroutine. I would therefore implement a separate goroutine for each Isolate that is tasked with freeing the Values. Imho it isn't needed to try to free Values at the start or end of HandleScape, after all who knows when the GC will finalize them. If that garbage slurping goroutine used a HandleScope itself to batch free multiple Values, then things become quite easy because all the thread safety is correctly handled.

I'd question though the choice of a RWMutex which is optimized for single writer multiple reader but that is not the case here. Since the finalizer goroutine just needs to append values and the other goroutine just reads the values and then signals what it has finished, maybe a simple slice with a few atomics can do the trick. Pseudo Code:

struct Isolate {
    ....
    finalizerQueue *[]*Value
}

// finalizer goroutine
// temporarily grab the slice pointer and signal we are busy
q := atomic.SwapPointer(&iso.finalizerQueue, 0x1)
q = append(*q, v)
atomic.StorePointer(&iso.finalizerQueue, q)

// garbage slurping goroutine (per isolate)
for {
    q := atomic.SwapPointer(&iso.finalizerQueue, nil)

    if q == 0x1 {
        // new garbage coming in, yield and try again
        time.Sleep(1*time.Millisecond)
        continue
    } else if q == nil {
        // maybe check if we should exit the goroutine here
        time.Sleep(1*time.Second)
        continue
    }

    iso.HandleScope(func() {
        for _, v := range *q {
            // free v
        }
    })
}

pigLoveRabbit520 commented 2 years ago

So what is the best way to release Value memory for now?

arctica commented 2 years ago

So what is the best way to release Value memory for now?

Right now you still need to call Context.Close() because all Values are tracked in its struct and only freed when the Context is freed.

https://github.com/rogchap/v8go/blob/19ec71c83341aeddf2f341d2a27c9db987cc08ac/v8go.cc#L104 https://github.com/rogchap/v8go/blob/19ec71c83341aeddf2f341d2a27c9db987cc08ac/v8go.cc#L416

dylanahsmith commented 2 years ago

Unfortunately, values created with an API like v8go.NewValue that aren't given a context or a value that has a context get tracked on an internal context within the isolate. So the isolate would need to be disposed to free those values. So avoiding long lived isolates is a way to workaround this issue

pigLoveRabbit520 commented 2 years ago

@dylanahsmith I need to preload a huge scirpt and I want to reuse the Isolate object. if I dispose the Isolate and recreate the Isolate, I have to reload the huge script.

pigLoveRabbit520 commented 2 years ago

When I want to dispose the Isolate

iso.TerminateExecution()
iso.Dispose()

got this error:

fatal error: unexpected signal during runtime execution

#
# Fatal error in v8::Isolate::Dispose()
# Disposing the isolate that is entered by a thread.

how to safely dispose?

genevieve commented 2 years ago

I need to preload a huge scirpt and I want to reuse the Isolate object. if I dispose the Isolate and recreate the Isolate, I have to reload the huge script.

@pigLoveRabbit520 We are working on a PR to enable compiling a script in one isolate and reusing that compiled/cached data in other isolates for that usecase so that it skips the compile step and jumps to run.

how to safely dispose?

Do you have any other goroutines/async work/pointers to this isolate that could be running something?

pigLoveRabbit520 commented 2 years ago

After I invoked Dispose(), memory still kept rising...

In my opinion, iso.Prune() is in urgent need. I found the memory increased rapidly in high concurrent env.

genevieve commented 2 years ago

@pigLoveRabbit520 I'm not sure why after disposing the isolate the memory would keep rising but with regards to having a huge script that you want to avoid reloading on new isolates, it seems that isolates in the same process are sharing compiled scripts / cached data so unless your isolates are running in distinctly separate processes, you may be able to get the benefit of compilation from one isolate to use in others already: https://gist.github.com/genevieve/819b1a68c1ec61a3e668fe5b738ceaa4

pigLoveRabbit520 commented 2 years ago

@genevieve Your code is too simple, it's hard for me to reproduce the problem.

catherinetcai commented 2 years ago

@genevieve Your code is too simple, it's hard for me to reproduce the problem.

The complexity of the code doesn't really matter. What @genevieve is trying to show you in her example is that isolates share compiled scripts. She demonstrates this with the timed examples in her results. You can see that the first isolate takes the longest time. Subsequent isolates are able to share the compiled scripts. Her example is meant to show you that you may be able to help the memory issue you're seeing if you run your isolates within the same process.

She's trying to ask if your setup has isolates running in the same process or separate ones.

pigLoveRabbit520 commented 2 years ago

@catherinetcai thks for advice, I'm trying to reproduce the problem.

pigLoveRabbit520 commented 2 years ago

@genevieve Here is my demo, it may provide some clues.

func main() {
    bs, _ := ioutil.ReadFile("core.js")

    for {
        iso := v8.NewIsolate()
        ctx := v8.NewContext(iso)
        start := time.Now()
        val, err := ctx.RunScript(string(bs), "core.js")
        if err != nil {
            log.Fatalln(err)
        }
        fmt.Printf("get val %v\n", val)
        fmt.Printf("RunScript in iso: %s\n", time.Since(start))

        ctx.Close()
        iso.Dispose()
    }
}

core.js is Lodash Core build, I found the memory woule grow to 55M and remain stable. log:

RunScript in iso: 2.755319ms
get val undefined
RunScript in iso: 2.750841ms
get val undefined
RunScript in iso: 2.913231ms
get val undefined
RunScript in iso: 2.747284ms
get val undefined
RunScript in iso: 2.735461ms
get val undefined
RunScript in iso: 3.620819ms
get val undefined
RunScript in iso: 2.753191ms
get val undefined
RunScript in iso: 2.725467ms
get val undefined
RunScript in iso: 2.721518ms
get val undefined
RunScript in iso: 2.739127ms
get val undefined
RunScript in iso: 2.832806ms
get val undefined
RunScript in iso: 2.832611ms
get val undefined
RunScript in iso: 3.221326ms
get val undefined
RunScript in iso: 2.802894ms
get val undefined
RunScript in iso: 2.797963ms
get val undefined
RunScript in iso: 3.491105ms
get val undefined
RunScript in iso: 2.803522ms
get val undefined
RunScript in iso: 2.787454ms

dylanahsmith commented 2 years ago

@pigLoveRabbit520 let's keep this issue focused on the issue of V8 data references being held onto by v8go, preventing that data from being garbage collected by V8. If there is another type of memory usage issue, then a separate issue could be opened for it.

I found the memory woule grow to 55M and remain stable

That fact that it remains stable after growing to some size indicates that it might be a cache that was filling up. You could try using v8go.SetFlags to control caching configurations so see how that affects it.

pigLoveRabbit520 commented 2 years ago

@dylanahsmith For now, I have to create new iso and ctx and after http response I will close ctx and dispose the iso, although I can use pre-compile script.

func (k *Helper) GetContext() *v8go.Context {
    iso := v8go.NewIsolate()
    ctx := v8go.NewContext(iso)
    script, _ := iso.CompileUnboundScript(k.preScript, "bundle.js", v8go.CompileOptions{CachedData: k.cachedData})
    _, _ = script.Run(ctx) // this step is slow
    return ctx
}

In my test rerunning the script is slow because the script is huge.
I think iso can be reused.

pigLoveRabbit520 commented 2 years ago

@dylanahsmith there is a mistake in your doc: Pre-compile section:

source := "const multiply = (a, b) => a * b"
iso1 := v8.NewIsolate() // creates a new JavaScript VM
ctx1 := v8.NewContext() // new context within the VM

ctx1 := v8.NewContext() shoule be ctx1 := v8.NewContext(iso1)

genevieve commented 2 years ago

@pigLoveRabbit520 You can submit a PR to update the readme for that. For any discussions on performance of the new APIs, would you mind opening a separate github issue?

timo-klarshift commented 2 years ago

So am I right that currently it's a bad idea to have long lived isolated as they aggregate unfreeable memory when calling RunScript on them? So the best one can do currently is renew isolates from time to time to throw away allocations and then work around the overhead with precompiled scripts?

Are there any plans to fix this somehow? A value.Dispose method would be really handy so then we could at least manage this our own.

dobarx commented 1 year ago

Any update on this? I'm having the same issue.

snej commented 1 year ago

After working with v8go a few months, I've just discovered this issue after writing and profiling a benchmark. Unfortunately in my use case it's not practical to throw Isolates away — I need high throughput, and spinning up a new Isolate for every request is too much of a slowdown.

I've been thinking about the problem and (before finding this thread) basically came to the same conclusion as @dylanahsmith above. This will also reduce the rather high per-call overhead of locking the Isolate etc. (see the LOCAL_CONTEXT, LOCAL_TEMPLATE, ISOLATE_SCOPE macros.)

Has anyone done this work yet?

snej commented 1 year ago

I think I can fix this without breaking API compatibility. Here are my notes (work in progress):

New C++ class ValueScope that owns a set of Persistent<Value> handles.
Every C++ Context owns a C++ ValueScope that lives as long as it does, replacing the current vector of persistent handles.
The C++ functions that create Values register the handles with the Context's current ValueScope.

So far this just duplicates the current behavior. But now:

Give Context a stack of ValueScopes, not just a single one.
New Go struct ValueScope, bridged to the C++ class. Its finalizer deletes the C++ object, releasing the handles. The struct can also be invalidated explicitly to free the handles immediately.
Context.PushScope() *ValueScope creates a new ValueScope and attaches it to the C++ Context. It also calls the C++ Context::Enter() method to make this the Isolate's current Context.
Context.PopScope(ValueScope*) detaches the scope from the context (or panics if it's not current) and calls Context::Exit.

This preserves the current behavior, so it's not a breaking change. But it lets you do:

{
  scope := context.PushScope()
  defer context.PopScope(scope)
  val1 := v8.NewString(context.Isolate(), "foo")
  // ...
}

val1's C++ persistent handle will be created in the temporary scope and freed when the scope is popped. So this code does not leave any allocations behind.

snej commented 1 year ago

My fork at snej/v8go incorporates this change and a number of other optimizations. The changes are pretty extensive so I'm not sure if this is something that will be accepted as a PR or if I'll end up keeping a long-term fork.

I didn't end up implementing "value scopes" the way I outlined earlier. It worked, but IMO had too much overhead allocating and freeing the ValueScope objects.

Instead I realized that since the scopes form a stack, you can just keep all the persistent handles in a single vector. PushScope just remembers the current size of the vector, and PopScope shrinks the vector down to the prior size.

I didn't like the way that accessing an obsolete Value object dereferenced freed memory. Sure, it's most likely to segfault and that just gets caught as a Go panic, but in the worst case messing with an obsolete Value could cause arbitrary Bad Stuff to happen; I wouldn't want to bet against remote code execution as a possibility.

So instead, Value just remembers the index of the handle in the Context's vector, and the handle is looked up when the Value is accessed. If it's out of range, it panics. I also store a 32-bit scope ID in the Value and in the vector element; every time PushScope is called the current scope ID is incremented. That way if a vector index later gets reused for a different scope, it gets detected as a mismatch of scope IDs and again causes a panic.

I've added some other optimizations:

Isolate.Lock() creates v8 Locker, Isolate::Scope and HandleScope instances, the same locking that happens at the start of every native call. Isolate.Unlock() destructs them. Wrapping these around a bunch of v8go calls really speeds them up ... I saw a 50% improvement in a simple-but-nontrivial benchmark.
I also tweaked the conversions between Go and V8 strings, reducing the number of heap allocations and the amount of byte copying.

rogchap / v8go

V8 values returned by v8go are not garbage collected by V8 #105