mmtk / mmtk-core

Memory Management ToolKit
https://www.mmtk.io
Other
379 stars 69 forks source link

Guarantee old objects don't move, or add a query API for it #1163

Open wks opened 4 months ago

wks commented 4 months ago

Some VMs have some kinds of root sets that contains a large number of edges into the object graph, but are seldom changed between GC. One example is the "code cache roots" in OpenJDK. This root set contains tens of thousands of edges, represented as slots. It takes up a large proportion of the root-scanning time, and also a significant part of the transitive closure time for nursery GCs. However, entries in the "code cache roots" set are registered in bulk during the start-up time of the VM. But between most GCs, the entries are not changed.

Such root sets can take advantage of generational GC. We only need to scan the roots newly added or modified since the last GC if the current GC is a nursery GC. The LXR branch already does this optimization: https://github.com/wenyuzhao/mmtk-openjdk/blob/lxr/mmtk/src/gc_work.rs#L242-L262

To do this optimization, it will require the generational GC to guarantee that old objects don't move in a nursery GC. It is a reasonable requirement because if the generational GC needs to move old objects during nursery GC, it loses the advantage of generational GC. Currently, all of our generational plans guarantee this. Specifically,

If we can make this guarantee, we should document and VM bindings can take advantage of it. Possible places include:

And the code in the LXR branch should call CommonGenPlan::trace_object_nursery(object) to determine whether it scans only the NURSERY_CODE_CACHE_ROOTS or both that and MATURE_CODE_CACHE_ROOTS.

On the contrary, if we can't make such guarantee because there are some old objects that can be moved during a nursery GC, we will need to provide an API function, such as GenerationalPlan::current_gc_may_move_old_object() -> bool so that the VM binding can query if it is legal to optimize the root-scanning code by scanning only newly added/modified roots.

wks commented 4 months ago

Some plans, such as G1, may leave young objects in the nursery for several GCs before promoting them to the mature space. Our current GenCopy, GenImmix and StickyImmix palns always promote young objects immediately after one GC.

If we implement GC algorithms like G1 in mmtk-core, we may need to have a more precise definition and/or richer API w.r.t.

The more details we expose through the API, the better the VM can make use of plan-specific information to optimize root scanning. Meanwhile, there is a risk that the VM may end up having to deal with too many plan-specific details, complicating the VM binding development.

p.s. This is still better than building the VM with assumptions about a single GC algorithm, preventing the VM from adopting more advanced GC algorithms.

It is also possible that after we measure the performance, it ends up not profitable to have more than two generations.

We need more in-depth discussions over this matter.

k-sareen commented 4 months ago

I'm very confused by this issue. Most if not all generational algorithms will not move old objects during a minor collection, because they literally do not have that much information to move them. Where or why do you envision this even occurring? I think it's not a good idea to expose finer grained information in the API as it makes us less general.

wks commented 4 months ago

I'm very confused by this issue. Most if not all generational algorithms will not move old objects during a minor collection, because they literally do not have that much information to move them. Where or why do you envision this even occurring?

Good point. We can move an object if and only if we can identify all references pointing to an object (and none of them are pinning roots or un-update-able fields). We don't trace old objects in a nursery GC, therefore we can't identify all references to old objects, and we can't move old objects. In contrast, we can move young objects thanks to the remembered set. Using the remembered set, we can selectively update edges from old to young objects.

k-sareen commented 4 months ago

Exactly. So can I ask what prompted this issue? The codecache issue was known before and its optimization is fairly straightforward. I definitely don't think exposing how many generations exist etc. is a good idea for the API. The contract should just be allocate objects for me and you can manage them.

wks commented 4 months ago

Where or why do you envision this even occurring?

What about reference-counting-based GC? After the only reference to an object is removed, we may recycle it. That recycled object may be an old object in the sense that it may have survived many GCs before. In deferred and coalescing RC, we flush the buffers from time to time, and don't always perform full-heap GC. This means a "minor" GC can recycle an old object.

Deferred RC does not fit into the categories of "nursery GC" and "mature GC". We may apply the "alloc-dead" optimization described in https://www.steveblackburn.org/pubs/papers/rc-ismm-2012.pdf so that "young objects" (defined as those allocated since the last GC) will be reclaimed automatically. But a minor reference-counting GC will also reclaim older objects, too.

Maybe @wenyuzhao have more insight about RC-based plans and the optimizations we can do on the VM side.

I think it's not a good idea to expose finer grained information in the API as it makes us less general.

But we need finer-grained information to optimize root scanning. As we discussed in today's meeting, if the transitive closure time is short, root scanning may become a bottleneck. The "code cache roots" I described in the original post is highly generational. But we need details about the generational GC (or the minor/major GC) to do such optimizations correctly. For example, if reference counting can reclaim old objects, we will have to update the MATURE_CODE_CACHE_ROOTS hash table, too, otherwise it will contain dangling references. If G1 allows a young object to stay in the nursery for several GCs, and each nursery GC still don't scan all objects, the VM binding may need to postpone promoting elements in NURSERY_CODE_CACHE_ROOTS to MATURE_CODE_CACHE_ROOTS, too. It should still have a NURSERY_CODE_CACHE_ROOTS because young entries (about 0 or 1) are still significantly fewer than mature entries (15000+). This is an optimization that is worth doing, given how expensive root scanning can be.

I definitely don't think exposing how many generations exist etc. is a good idea for the API.

I agree with this part. We probably don't want one semantics per plan. We may end up having several different generational semantics, such as (1) non-generational, (2) nursery GC (cannot enumerate dead objects) and mature, (3) RC (not generational, but can enumerate dead objects) and full-heap. For the case of nursery GC, it may promote all young objects immediately, or postponing the promotion later. But that may be too specific. I wonder what API we should provide to VM bindings if we implement G1. One possibility is that we will never provide the vanilla G1 algorithm, but maybe a variant of it that only has two generations.

By the way, I have seen some blog posts about how to "tune GCs", such as adjusting the sizes of each generation. I really doubt whether such practice should be recommended because that means the GC algorithm is really non-general, and is highly sensitive to actual workload. Maybe MMTk shouldn't give the VM bindings too many knobs for them to turn.

wks commented 4 months ago

So can I ask what prompted this issue? The codecache issue was known before and its optimization is fairly straightforward.

It's just the code cache issue. It's about whether the optimization is sound. It's not that straightforward I think. It depends on the assumptions that holds for GenCopy, GenImmix and StickyImmix, but will break if we implement G1.

wenyuzhao commented 4 months ago

What about reference-counting-based GC? After the only reference to an object is removed, we may recycle it. That recycled object may be an old object in the sense that it may have survived many GCs before. In deferred and coalescing RC, we flush the buffers from time to time, and don't always perform full-heap GC. This means a "minor" GC can recycle an old object.

This is solved by skipping RC decrements for code cache root pointers. That means for every young code cache slot, we only apply increment once without decrements, to intentionally "leak" these objects, keep them alive, and defer their reclamation to concurrent marking.

k-sareen commented 4 months ago

What about reference-counting-based GC? After the only reference to an object is removed, we may recycle it. That recycled object may be an old object in the sense that it may have survived many GCs before. In deferred and coalescing RC, we flush the buffers from time to time, and don't always perform full-heap GC. This means a "minor" GC can recycle an old object.

This makes no sense to me. If the object is dead then it is dead. It is not being moved. Since it is dead, the VM should not be able to access it from anywhere. If it is in the root set somehow, then it is not dead. Did you perhaps mean cases where RC is moving objects in the pause? I would imagine they would only move young objects again due to the same reasons as generational GCs.

k-sareen commented 4 months ago

All I'm saying is that this is a fundamental property (that old objects will not move) for all generational plans and VMs can definitely depend on this assumption holding true. So we don't need to do anything extra/special imo. You mentioned G1 as it may not move young objects which again is not a big deal since that can also be true for StickyImmix as we can leave young objects in-place.

The issue you have mentioned is about "old" objects. So the terminology here isn't exactly clear to me. Do you mean "mature" objects, i.e. objects promoted to the mature generation. Or do you generally mean objects not allocated in the current GC epoch.

wks commented 4 months ago

What about reference-counting-based GC? After the only reference to an object is removed, we may recycle it. That recycled object may be an old object in the sense that it may have survived many GCs before. In deferred and coalescing RC, we flush the buffers from time to time, and don't always perform full-heap GC. This means a "minor" GC can recycle an old object.

This makes no sense to me. If the object is dead then it is dead. It is not being moved. Since it is dead, the VM should not be able to access it from anywhere. If it is in the root set somehow, then it is not dead. Did you perhaps mean cases where RC is moving objects in the pause? I would imagine they would only move young objects again due to the same reasons as generational GCs.

Deferred RC do not apply reference counts from roots during mutator time. So if an object is pointed by one root and another heap object, its reference count is 1 during mutator time. If the mutator removes the reference from the other object, its ref count becomes 0. But during the next GC, we re-apply the ref counts from roots, so its refcount becomes 1 again, so it is not dead. After the GC, we decrement all ref counts from roots, so its refcount becomes 0 in the next mutator phase, but it is not dead.

However, re-applying the ref counts from roots means scanning all roots. That's just root scanning. So we need to workaround the root-scanning cost, too. Like @wenyuzhao said, the LXR branch intentionally omits the decrements of ref counts from mature code cache roots at the end of a GC. In this way, it is like the reference counts from code cache roots are always applied during mutator time. In other words,

I think deferred RC can simply arbitrarily partition the root set into "the part it is willing to scan" and "the part it doesn't want to scan because they are too many". If it doesn't want to scan some roots, just leave one ref count and the object will not die. And we don't move objects when collecting only using RC (because we don't identify all references during RC).

But this arbitrariness also needs communication between the VM binding and mmtk-core. The VM binding may need to express something like "Hey, here are roots, but they are too many. Keep them alive and don't ask me for them again until the next 'major' GC, whatever 'major' means." RC-based plans simply skip their decrements. Generational GCs probably prefer a different language, such as "I won't reclaim old objects (those survived the last GC) in this GC. Just give me young roots."

k-sareen commented 4 months ago

Okay. So the definition of "old" object here is "objects not allocated in the current GC epoch" like I mentioned above. That's a slightly different definition than mature objects as in generational GCs.

Yes deferred RC also needs to scan roots to have correct inc/decs but LXR's solution is adequate, no? I'm honestly still not sure what exactly do you want MMTk to do. I don't think your issue is actually about "moving" objects or not. It's fundamentally about how to partition the heap to optimize root scanning and/or tracing.

wks commented 4 months ago

... since that can also be true for StickyImmix as we can leave young objects in-place.

Even if we disable moving GC for nursery GCs in StickyImmix, we still leave the mark bits set for young objects that survives the current GC. They won't be traced again in the next GC if it is nursery GC. We can erase their mark bits so that they remain young, but that sounds unnecessary.

The issue you have mentioned is about "old" objects. So the terminology here isn't exactly clear to me. Do you mean "mature" objects, i.e. objects promoted to the mature generation. Or do you generally mean objects not allocated in the current GC epoch.

Yes. The definition matters. I use the term "young" and "nursery" interchangeably, and "old" and "mature" interchangeably. But "young/old" may have two different definitions:

  1. Young objects are those allocated since the last GC, and old objects are those allocated before and survived the last GC.
  2. Young objects will be traced during the next young GC and their life and death will be re-decided; old objects will not be traced in the next young GC, and will be assumed as alive.

The first definition is plan-agnostic, while the second definition will involve details of concrete plans, such as how many generations it has, and how long an object stays in one generation before promoted. For our current GenCopy, GenImmix and StickyImmix, the two definitions happen to be equivalent.

Yes deferred RC also needs to scan roots to have correct inc/decs but LXR's solution is adequate, no?

I think LXR's solution is adequate. But we need a way to tell the VM binding it is the right time to apply that optimization. e.g. The current plan is LXR or other deferred RC plan.

... It's fundamentally about how to partition the heap to optimize root scanning and/or tracing.

Yes. It is about partitioning the heap.

The GC Handbook uses a term "condemned space" which refers to the space the GC will trace. For generational GC, the "condemned space" of a nursery GC is the nursery. Some GC algorithms reclaim objects in each chunk in turn. G1 also selects a collection set before starting to collect, and that also satisfies the definition of "condemned space". But I don't think we should expose the spatial structure of the heap to the VM binding because that could be arbitrarily complicated.

It should be more practical to exploit temporal properties. For example,

It shouldn't get more complex, or it'll be too complicated for the VM binding to use. For now, I think if a GC guarantees "objects that survived the last GC will not be collected by a nursery GC", it will be easy for the VM bindings to do optimizations for it.