timob / jnigi

Golang Java JNI library
BSD 2-Clause "Simplified" License
163 stars 44 forks source link

Feature: support deleting GlobalRef from Env.classCache to avoid memory leak #83

Closed jairobjunior closed 6 months ago

jairobjunior commented 6 months ago

✨ Types of changes

💡 Motivation and Context

My application was recently experiencing OOM after running for a couple days. The architecture of the app is Java running on top, calling into JNI using jnigi then calling into Go. Java can make many calls to the JNI layer with different types of parameters, including String.

This was a very interesting/hard leak to find, because the memory stats from the Main app running in Java didn't indicate any abnormal growth, but the RSS values definitely had a very linear curve. So we started isolating the components and verified that the leak was coming from the JNI. I then wrote a stress test app and used jemalloc to trace the memory on the stack. jemalloc gave me this chart:

leak

That showed me that there is a large % of memory used by jni_NewGlobalRef that wasn't getting freed. For some reason the map symbols were not working on the memory boxes above the global ref. I then started tracing where jnigi calls jni_NewGlobalRef and that was how I got to the problem.

🐛 Problem

Here is how my JNI layer looks with the leak:

//export Java_com_example_process
func Java_com_example_process(jniEnv unsafe.Pointer, clazz uintptr, jname uintptr) {
    env := jnigi.WrapEnv(jniEnv)
    strObj := jnigi.WrapJObject(jname, "java/lang/String", false)
    defer env.DeleteLocalRef(strObj)
    var goString []byte
    err := strObj.CallMethod(env, "getBytes", &goString)
    if err != nil {
        return
    }
    //Call into go with goString
}

That looked sort of harmless in the beginning, until we had to run the app for days.

The issue lives inside of the env.classCache. When env gets out of scope, all of the env.classCache will also get destroyed. The catch is that those objects inside of env.classCache were created using NewGlobalRef by callFindClass and are not deleted when env gets out of scope, creating dangling references.

🎯 Solution

This PR is introducing a new method called env.DeleteGlobalRefCache(), this method should be called after you want to dispose an instance of *jnigi.Env. It'll call deleteGlobalRef in all of the classCache and clear the map as well.

//export Java_com_example_process
func Java_com_example_process(jniEnv unsafe.Pointer, clazz uintptr, jname uintptr) {
    env := jnigi.WrapEnv(jniEnv)
        defer env.DeleteGlobalRefCache()
    strObj := jnigi.WrapJObject(jname, "java/lang/String", false)
    defer env.DeleteLocalRef(strObj)
    var goString []byte
    err := strObj.CallMethod(env, "getBytes", &goString)
    if err != nil {
        return
    }
    //Call into go with goString
}
timob commented 6 months ago

I ran into the same problem just recently too. Looks good. Thanks for all the details, writing the tests and comment doc! :tada:

Currently we do delete the cache in func (j *JVM) DetachCurrentThread(env *Env) error here. We should do this in the more general way using the method you provided, replacing the loop.

jairobjunior commented 6 months ago

@timob Sounds good. I just replaced the loop you mentioned with a call to the method env.DeleteGlobalRefCache().

timob commented 6 months ago

Thanks!