Open wenyuzhao opened 1 year ago
Oh yeah I had to debug a nasty issue for Android where this exact thing was happening with the SemiSpace GC. It was copying buffers that had been pinned by the GetPrimitiveArrayCritical
JNI call. Yes I definitely think we need to support something like this in MMTk.
Currently MMTk does not know anything about critical region, or yieldpoints. It is totally in the binding.
MMTk simply does two things in terms of GC triggering: 1. block any thread (with block_for_gc()
) that is trying to allocate in the slowpath as we cannot allocate any more, and 2. call stop_all_mutators()
and wait until the binding has done that and returns from the method. If a binding does not want to start a GC right now, it can choose not returning from stop_all_mutators()
, and not stopping some mutators in yieldpoints. In that case, a GC won't start, and those mutators can still allocate from their thread local buffer. A GC will start once the binding returns from stop_all_mutators()
.
For the cases you mentioned, they can all be implemented for OpenJDK in the binding without an issue (I assume). This is by design, not 'simulated' in the binding. Do you suggest we should move that to MMTk core? This would change our abstraction. What would be the reason that we want to change the abstraction?
I don't think this is actually that hard to implement properly. I investigated how OpenJDK works when I came across the Android bug. We just need to make the lock_gc_or_pin_object()
function MMTk aware. For GCs like Immix which support pinning, we can trivially pin the object, and for GCs like SemiSpace we have to disable collection temporarily.
We then just need to unpin or re-enable collection when the critical section is over in the unlock_gc_or_unpin_object()
function. @wenyuzhao this would be the most performant implementation and I think will be fairly simple to implement in LXR as well, as we have support for object pinning in mmtk-core.
@qinsoon Yes at least I believe that we need to support disabling GC temporarily in mmtk-core. It may be binding-specific in that only a few bindings may actually use it, but it's actually something that logically sits in the core (there can be many reasons why we need to disable a collection temporarily).
Though technically we can just use the {enable,disable}_collection
functions for those purposes.
@qinsoon Yes at least I believe that we need to support disabling GC temporarily in mmtk-core. It may be binding-specific in that only a few bindings may actually use it, but it's actually something that logically sits in the core (there can be many reasons why we need to disable a collection temporarily).
Though technically we can just use the
{enable,disable}_collection
functions for those purposes.
The current enable/disable_collection
means the binding can keep allocating without triggering a GC (and heap limit is ignored), which may not be something you want (it is a very hacky function to support some bindings' behavior like that).
Back to Wenyu's original issue, for cancelling a GC request, what would happen for mutators that run out of their thread local buffer? They need more memory, and MMTk cannot provide more memory (otherwise MMTk would not trigger a GC in the first place). They would be blocked. If this is the case, that would be very similar to what we have now. We use block_for_gc()
to block threads that need slowpath allocation, and we wait for stop_all_mutators()
for the other threads. Whether to stop other threads immediately or wait a little bit is the binding's choice.
Did you discuss this issue in today's meeting? I feel I may have missed some context for the issue.
If a mutator thread is executing native C functions, it is already considered "in safe region" because it cannot access heap objects unless it calls JNI functions. The stop_all_mutators
function will consider such threads as "already stopped".
However, if a mutator calls Get{PrimitiveArray,String}Critical
, it means it leaves "safe region" because it can access the elements of the Java array or String directly. When it calls Release{PrimitiveArray,String}Critical
later, it will enter "safe region" again because it loses access to that array or String.
My feeling is that as long as stop_all_mutators
consider mutators in "critical sections" as "not stopped yet", GC will never run concurrently with mutators in "critical sections". This can be implemented in mmtk-openjdk completely.
If a binding does not want to start a GC right now, it can choose not returning from
stop_all_mutators()
, and not stopping some mutators in yieldpoints.
Yes. Get{PrimitiveArray,String}Critical
shall prevent stop_all_mutators()
from returning. More precisely, it depends on which thread wins the race of "entering critical section" and "stopping all mutators". Assume M1 is a mutator and G1 is a GC thread (coordinator or worker). M1 wants to enter critical section and G1 wants to stop all mutators (probably because another mutator triggered GC).
GetXxxxCritical
before G1 calls stop_all_mutators
, M1 will enter critical section, and G1 will not return until M1 leaves critical section.stop_all_mutators
before M1 calls GetXxxxCritical
, M1 will not be able to enter critical section until the GC finishes (i.e. when G1 calls resume_mutators
).In conclusion,
For 2., we need to think about these questions:
Update the current GC trigger to support GC cancellation and pending GC.
Use case for OpenJDK
Context: JNI Critical Region
When JNI code is accessing data in a Java array as a raw buffer, it should mark the current piece of code as a "critical region" to prevent the GC to move this object under the hood.
GC plans can choose:
Read more: https://shipilev.net/jvm/anatomy-quarks/9-jni-critical-gclocker/
Handle GC requests in critical region
If a GC happens when there are mutators inside critical regions, the OpenJDK GCs (like G1 or parallel) will cancel this GC request [1], and schedule a pending GC request [2].
Before all the threads exiting the critical region, some mutators may still allocate (with a few more retrial), before being blocked.
The pending GC will be executed right after all the threads are exited from the critical region [3].
The above process can be simulated entirely in mmtk-openjdk [4], but pending GC or GC request cancellation is a great feature to mmtk-core.
[1] https://github.com/mmtk/openjdk/blob/jdk-11.0.19%2B1-mmtk/src/hotspot/share/gc/g1/g1CollectedHeap.cpp#L1128-L1131 [2] https://github.com/mmtk/openjdk/blob/jdk-11.0.19%2B1-mmtk/src/hotspot/share/gc/shared/gcLocker.cpp#L98 [3] https://github.com/mmtk/openjdk/blob/jdk-11.0.19%2B1-mmtk/src/hotspot/share/gc/shared/gcLocker.cpp#L158 [4] https://github.com/wenyuzhao/mmtk-openjdk/compare/956fb128c6a72529d2aeface7e1d3b56ae9c61e2..8432f53686a5163ad199dcf6dda94cde5bd5a730