Proposal: Change the way the MemoryManager runs.

Reminder: The MemoryManager is our central database that holds a weak reference to all Godot related object created on the JVM side, being core types, wrappers for native types or even scripts. When created, all those objects are registered to a ReferenceQueue that allows us to know when they have been collected by the GC. This callback is necessary so we can decrement the counter of RefCounted objects on the Godot side. That operation is currently done in a separate thread.

Whenever that thread is woken up by the OS's scheduler, it's checking the queue to see if there is any RefCounted instance to decrement. This design has 3 consequences:

This thread can be active at random times and is totally at the mercy of the operating system. Not all schedulers are born equals, it can lead to quite the different behaviour depending on the platform.
Objects are freed in big batches, with timings correlated to the GC (When the GC runs + the time for the thread to wake up). The JVM and GC might be optimized for this kind of purpose, but Godot is not. We run the risk of interfering with the main thread because of the multiple locks necessary to manage RefCounted (our owns and the one already in the native Object).
All Refcounted instances with a JVM binding can no longer be freed by Godot itself. Because our binding is permanently increasing the counter by 1, MemoryManager always has the last word. We are bypassing how Godot normally operates with what I consider to be a hacky solution.

My proposal holds in 2 points:

Make the MemoryManager run in the main thread. The Language class has a frame() method. Like the name suggests, it's called at the end of every frame once the Scene (and so scripts) have been fully processed, which also include the secondary threads running in the new multithreaded scene system. Instead of relying on a JVM thread calling C++ once for every collected reference. We can instead have the module directly query the memory manager for the instances that have been GCs during that frame (if any) and get all the relevant data in a single JNI call (made even faster now that we can convert a whole Kotlin collection in a single call). It technically means more work for the main thread, but the pacing will be stable (at least as stable as the game itself) and more granular. This will greatly reduce the amount of data race, the call happening after the scene system ran. It won't stop all of them, because Godot also has its ThreadPool and users can create their own threads as well. But it's safe to say that the scene threads are the ones that will interact the most with the Godot Script system anyway, and so with our module.
Remove the logic that switch the C++ binding to a weak/strong ref when the counter of a RefCounted reaches 1. Instead, the JNI reference for a RefCounted will always be weak. It means that unless the Kotlin wrapper for the RefCounted instance has a reference kept directly inside the JVM code, it will quickly be collected by the GC and the counter decremented. The logic behind it is to give control back to Godot when it comes to managing a RefCounted. Our module will no longer be the one to always free those instances, it will only occur if the JVM just happen to be the last one to hold a reference. A Kotlin wrapper for Refcounted will basically act just like the Ref<> wrapper in C++ but instead of decrementing the counter when going out of scope, it will decrement the counter when the GC ticks. The pro of this change will be a simpler binding management for Refcounted by making the binding callback no constantly checking the counter and playing ping pong with weak/strong reference switches (reminder that the function doing that is called every single time the counter of a Refcounted is modified). The con will be a higher creation/deletion rate for the RefCounted instances. In my opinion, it should be fine. If a Refcounted wrapper is created but not kept in the JVM (just used as a function argument, for example) it will be quickly be managed by the GC young generation. And if it's kept referenced in the JVM, then the behavior will be no different from today. It's a loss for short-lived objects and a win for long-lived objects, which matches with the good practice of keeping objects in memory on the JVM instead of constantly creating new ones.

All in all, I think the proposal will simplify the memory management a bit and make it way more resilient.

Follow up after discovering new issues with our current design and profiling JNI overhead more extensively. The first sad truth is that I couldn't find a design that could totally got rid of the pingpong between strong and weak references. I can remove it from the base godot wrapper types, but not from JVM scripts. Its frequency can be reduced as well, but it will have to remain in some shape. The reason for that is why we have the pingpong in the first place. The C++ code needs a reference to the JVM Script so it can make calls to it (thought the common lifecycle callbacks like _process()). It means that as long as native C++ instance is alive, the script must remain alive too hence why we need a strong reference to it. The issue starts with the JVM side when it comes to managing refcounted instances. A JVM instance of a RefCounted acts exactly like a Ref in Godot C++ and increment its counter. It results in a cyclical reference, Godot keeps the JVM instance alive and the JVM instance keeps the C++ Refcounted alive.

I don't have this issues with regular scriptless instances, or even with scripts Object. This issue only exists for RefCounted instances with a JVM script. So far this model applied equally to wrappers and scripts, I can change it to be only for scripts. Now how to deal with that cycle ? We break by turning a strong reference into a weak one, which can only be the one used by the C++ binding. Of course, we can't just allow this reference to be weak all the time. If it was the case, then whenever a script is only used by Godot but not store directly by the Kotlin/Java code, it would be garbage collected. It means that as long as the C++ side is using the script, we can't turn it into a weak reference. So how do we know when the C++ is no longer using it ? With its counter value. When it reaches 1, we know that the only remaining reference has to be on the JVM side, it's then safe to switch to a weak reference, and let the script be naturally GCed once it's no longer used on the JVM side.

So far so good, it's a relatively simple task to handle, just use the refcounted callbacks Godot already provide and do that switch whenever the counter reaches that value of 1. Except we recently discovered a nasty side effect of that design, some cases create a high frequency pingpong. The one example I have in hand is when you use a Godot resource only referenced by the JVM side. In that situation, the C++ side only hold a weak reference to it. Now a Godot resource needs to be used somehow at some point, mostly by passing it as a parameter to a call to the C++ API and that when we kill our performances.

Let's take that bit of code from our most efficient bunnymark benchmark:

    @RegisterFunction
    override fun _draw() {
        for (bunny in bunnies) {
            drawTexture(bunnyTexture, bunny.position)
        }
    }

Here we use a basic CanvasItem::drawTexture call to draw all our bunnies on screen using a texture. This texture is a Refcounted and only referenced in our script. Now each iteration of that loop is going to send the texture to C++. Because C++ gets a reference to the texture, the counter will be incremented and a switch to a strong reference happens. Once the call is done, Godot has no more use for the texture and so doesn't store it anywhere, the counter is decremented, back to 1 and a switch to a weak reference happens. "Switching" the reference implies creating a new one of the correct type and deleting the previous one. So for each call to drawTexture(), here what happens:

Texture's counter increased to 2
Create Strong JNI reference
Delete Weak JNI reference
DrawTexture is executed
Texture's counter is decreased to 1
Create Weak JNI reference
Delete Strong JNI reference

Every single call to the Godot API is causing 4 calls to the JNI API, which as you probably know are expensive. How much expensive ? Well it may be a bit unfair to use a benchmark made only for the sake of stressing this particular usecase, but the numbers are quite telling nonetheless. If I choose to run the benchmark with this pingpong behavior disabled (and so creating a potential memory leak), the final score on my computer goes from barely 35k to 80k+. It's more than twice the performances.

As I explain earlier, I couldn't think of a way to get rid of the pingpong. At least not one that wouldn't require silly things like asking Kotlin users to do like in C++ and wrap all their RefCounted instances inside a Ref<> wrapper, which is a huge no for me when we are on a JVM with a GC, it's too big of an antipattern for me to accept this kind of solution.

Now I think I can still create a design that would also to make the pingpong frequency so slow that it wouldn't matter any more. The current design is to promote/demote the JNI reference at the same time as the counter increments/decrements. We need to break this model. My suggestion is to "delay" those switches as long as possible.

First, we need to identify when and how those switches happen. Fortunately, we have very little of them:

A switch to a weak reference happens when the last C++ Ref to our script instance has been deleted, leaving the the JVM the only owner of it. We can afford to delay this one because it can't cause any harm to the execution of the program. At worse, it will take longer for the instance to be freed, when we already are in a context with a GC, so freeing memory is always delayed. Instead of switching to a weak ref, we can simply make the binding a "candidate" for demotion. At the end of the current frame, we gather all those candidates and do the all the switches at once. Not all candidates will be demoted as it's possible the counter has been increased again during the frame.
A switch to a strong reference happens when the RefCounted is sent back to C++, either as a return of a JVM method called by Godot or as a parameter of a call to Godot made by the JVM. Those are harder to delay, if we don't switch a strong reference, there is a risk for the JVM instance to be GC when it's actually in use by the C++ code.
When Godot gets the Refcounted as a return value, we have no other choice than immediately promote it to a strong reference. We can't wait until the end of the frame because we have no guarantee that the reference on the JVM side will remain that long.
It's a similar story when the Refcounted is a parameter of an API call, but we don't need to promote it instantly. The JVM is the caller of the C++ method so we know exactly when that call is over. We still can't wait until the end of the current frame but at least we can wait until the end of the call to do the promotion. If after the call, the counter is back to 1, then we don't need to do any promotion to a strong reference in the first place. It totally gets rids of the pingpong happening in the benchmark example.

Summary: The pingpong is here to stay but its effects lessen a lot because it's only necessary for Refcounted with a JVM script. And out of 3 situations that requires a switch, 2 can be delayed and even cancelled.

Part 2 of the follow-up focus on the implementation details. We could implement the suggestions I mentionned above inside the current system. But this would be adding extra complexity on top of something already quite hard to understand. I have been the one designing most of the memory management for that project and still get confused at time too. We already went through 3 versions of the MemoryManager, but I think a 4th might be necessary? It'd be better to come with a new model that can naturally deal with those new requirements, instead of having to force them in. Now that the C++ code is much cleaner and JvmWrapper easier to use, we can afford to do something we couldn't before: To move some of the MemoryManager logic to the C++ side.

If our main issue is crossing the boundary too often, then we have to adapt the logic so it happens the least possible. Like stated previously, our main current issue is the expensive cost of the pingpong between weak and strong references. The reason it's expensive is that it requires JNI calls. If we can't rid of it, then why not trying to move the logic to the JVM side ? The only reason those JNI references exist is to keep JVM instances alive. As long as we have a way to keep instances alive, it actually doesn't matter if on which side it's done. It means we need some way on the JVM side to keep strong references to wrappers and scripts. The thing is that we already got something similar in the MemoryManager. Before continuing, let's enumerate the current responsibilities of the MemoryManager regarding Godot native objects (we omit core types):

Keep a DB up to date that allows to match any an Object ID to their matching JVM instance (if they exist).
Keep a list of newly created JVM instances so they can be sent to C++ to be bound.
React to the GC so RefCounted instances can have their counter decremented.

Instead of relying on a strong JNI reference to keep a JVM instance, what if we were just using this giant database as a way to keep a strong reference ? We wouldn't even need to directly switch them with a weak reference when we want to break the cycle; we can simply remove them for the DB itself. You may wonder: "If we remove a JVM instance from the DB, how do we ever get it back using its ID?". The answer is simple, we don't need it. Think about it. We switch to a weak reference whenever the C++ doesn't use this instance anymore. So how would C++ even query a JVM instance from an ID it can't have in the first place ? With that in mind, we can transform the ping-pong into a simple add to/remove from the DB.

The new behaviour would be the following:

For Objects, you always have a strong reference in all cases. When we free the native object, we store its ID in a temporary buffer. At the end of the frame, we will send the content of that buffer to the JVM to remove the instance from the DB.
For Refcounted, we proceed similarly, except we do it when the counter reaches 0. The JVM instance is removed from the DB which is the equivalent of the former weak reference in C++.
Whenever we are about to send a Godot object from the JVM to C++ (being a method parameter or a return value), we had it back to the DB if missing. This means this is a no op for Objects, and the equivalent of switching to a Strong reference for Refcounteds. This is strictly identical to the current model: If the instance is not in the DB, it means the counter is 1 (and weak reference). When we send it back to C++, the counter is going to be incremented to 2, so we add it back to the DB (strong reference). The main difference is that we currently switch the type of reference as a callback to the counter changing value. With this new model, we reverse the order.

This new design also makes a serial Memory Manager easier to implement. We no longer need the Memory Manager to send its JVM instance to C++ to create a binding. Instead, the C++ side accumulates over a whole frame objects that needs to have their status changed. Then a call from C++ to the JVM is made sending the list of instances that have to be removed from the DB and returns the list of RefCounted instances whose counted must be decremented. The consequence is that, at the cost of delaying memory operations by one frame, we can properly sync Godot and JVM in a single JNI call per frame, the rest being handle by simple containers operating only in their own language.

I think all our needs are covered by that design. I hope I don't forget some sneaky details that would invalid the idea.

utopia-rise / godot-kotlin-jvm

Proposal: Change the way the MemoryManager runs. #618