mmtk / mmtk-core

Memory Management ToolKit
https://www.mmtk.io
Other
373 stars 67 forks source link

Support GC cancellation and pending GC #786

Open wenyuzhao opened 1 year ago

wenyuzhao commented 1 year ago

Update the current GC trigger to support GC cancellation and pending GC.


Use case for OpenJDK

Context: JNI Critical Region

When JNI code is accessing data in a Java array as a raw buffer, it should mark the current piece of code as a "critical region" to prevent the GC to move this object under the hood.

GC plans can choose:

Read more: https://shipilev.net/jvm/anatomy-quarks/9-jni-critical-gclocker/

Handle GC requests in critical region

If a GC happens when there are mutators inside critical regions, the OpenJDK GCs (like G1 or parallel) will cancel this GC request [1], and schedule a pending GC request [2].

Before all the threads exiting the critical region, some mutators may still allocate (with a few more retrial), before being blocked.

The pending GC will be executed right after all the threads are exited from the critical region [3].

The above process can be simulated entirely in mmtk-openjdk [4], but pending GC or GC request cancellation is a great feature to mmtk-core.

[1] https://github.com/mmtk/openjdk/blob/jdk-11.0.19%2B1-mmtk/src/hotspot/share/gc/g1/g1CollectedHeap.cpp#L1128-L1131 [2] https://github.com/mmtk/openjdk/blob/jdk-11.0.19%2B1-mmtk/src/hotspot/share/gc/shared/gcLocker.cpp#L98 [3] https://github.com/mmtk/openjdk/blob/jdk-11.0.19%2B1-mmtk/src/hotspot/share/gc/shared/gcLocker.cpp#L158 [4] https://github.com/wenyuzhao/mmtk-openjdk/compare/956fb128c6a72529d2aeface7e1d3b56ae9c61e2..8432f53686a5163ad199dcf6dda94cde5bd5a730

k-sareen commented 1 year ago

Oh yeah I had to debug a nasty issue for Android where this exact thing was happening with the SemiSpace GC. It was copying buffers that had been pinned by the GetPrimitiveArrayCritical JNI call. Yes I definitely think we need to support something like this in MMTk.

qinsoon commented 1 year ago

Currently MMTk does not know anything about critical region, or yieldpoints. It is totally in the binding.

MMTk simply does two things in terms of GC triggering: 1. block any thread (with block_for_gc()) that is trying to allocate in the slowpath as we cannot allocate any more, and 2. call stop_all_mutators() and wait until the binding has done that and returns from the method. If a binding does not want to start a GC right now, it can choose not returning from stop_all_mutators(), and not stopping some mutators in yieldpoints. In that case, a GC won't start, and those mutators can still allocate from their thread local buffer. A GC will start once the binding returns from stop_all_mutators().

For the cases you mentioned, they can all be implemented for OpenJDK in the binding without an issue (I assume). This is by design, not 'simulated' in the binding. Do you suggest we should move that to MMTk core? This would change our abstraction. What would be the reason that we want to change the abstraction?

k-sareen commented 1 year ago

I don't think this is actually that hard to implement properly. I investigated how OpenJDK works when I came across the Android bug. We just need to make the lock_gc_or_pin_object() function MMTk aware. For GCs like Immix which support pinning, we can trivially pin the object, and for GCs like SemiSpace we have to disable collection temporarily.

We then just need to unpin or re-enable collection when the critical section is over in the unlock_gc_or_unpin_object() function. @wenyuzhao this would be the most performant implementation and I think will be fairly simple to implement in LXR as well, as we have support for object pinning in mmtk-core.

k-sareen commented 1 year ago

@qinsoon Yes at least I believe that we need to support disabling GC temporarily in mmtk-core. It may be binding-specific in that only a few bindings may actually use it, but it's actually something that logically sits in the core (there can be many reasons why we need to disable a collection temporarily).

Though technically we can just use the {enable,disable}_collection functions for those purposes.

qinsoon commented 1 year ago

@qinsoon Yes at least I believe that we need to support disabling GC temporarily in mmtk-core. It may be binding-specific in that only a few bindings may actually use it, but it's actually something that logically sits in the core (there can be many reasons why we need to disable a collection temporarily).

Though technically we can just use the {enable,disable}_collection functions for those purposes.

The current enable/disable_collection means the binding can keep allocating without triggering a GC (and heap limit is ignored), which may not be something you want (it is a very hacky function to support some bindings' behavior like that).

Back to Wenyu's original issue, for cancelling a GC request, what would happen for mutators that run out of their thread local buffer? They need more memory, and MMTk cannot provide more memory (otherwise MMTk would not trigger a GC in the first place). They would be blocked. If this is the case, that would be very similar to what we have now. We use block_for_gc() to block threads that need slowpath allocation, and we wait for stop_all_mutators() for the other threads. Whether to stop other threads immediately or wait a little bit is the binding's choice.

Did you discuss this issue in today's meeting? I feel I may have missed some context for the issue.

wks commented 1 year ago

If a mutator thread is executing native C functions, it is already considered "in safe region" because it cannot access heap objects unless it calls JNI functions. The stop_all_mutators function will consider such threads as "already stopped".

However, if a mutator calls Get{PrimitiveArray,String}Critical, it means it leaves "safe region" because it can access the elements of the Java array or String directly. When it calls Release{PrimitiveArray,String}Critical later, it will enter "safe region" again because it loses access to that array or String.

My feeling is that as long as stop_all_mutators consider mutators in "critical sections" as "not stopped yet", GC will never run concurrently with mutators in "critical sections". This can be implemented in mmtk-openjdk completely.

If a binding does not want to start a GC right now, it can choose not returning from stop_all_mutators(), and not stopping some mutators in yieldpoints.

Yes. Get{PrimitiveArray,String}Critical shall prevent stop_all_mutators() from returning. More precisely, it depends on which thread wins the race of "entering critical section" and "stopping all mutators". Assume M1 is a mutator and G1 is a GC thread (coordinator or worker). M1 wants to enter critical section and G1 wants to stop all mutators (probably because another mutator triggered GC).

qinsoon commented 1 year ago

In conclusion,

  1. this can be implemented in the binding under the current API.
  2. we may want to support this in mmtk-core to make the binding implementation easier.

For 2., we need to think about these questions: