wks commented 10 months ago

Note: This issue is not part of the MEP https://github.com/mmtk/mmtk-core/issues/1043, although this issue originates from the discussion under that MEP. This is orthogonal, and should be discussed separately. I am also not suggesting this issue should be addressed immediately.

TL;DR: Some VMs use singleton immortal objects to represent special values, such as None in Python and nothing and missing in Julia. If tracing them is costly (need verification), we may allow some edges not to be traced.

Not traversing the entire object graph?

Global singleton objects

Python uses a special object None to represent the absence of value. True, False are also global singletons. Julia uses nothing and missing to represent the absence of value, too, in different contexts. CRuby has similar global singletons, such as vm_empty_cc and vm_empty_cc_for_super. In reality, those objects may be implemented as static C variables, or they may be implemented as MMTk heap objects.

To simplify our discussion, we assume (without losing generality) that such objects are in the MMTk heap.

If None, True, False, nothing and missing are MMTk heap objects, they need to be traced (if tracing GC) so that they are kept alive, and forwarded (if moving GC). But if a VM has too many fields pointing to such singleton objects, it may be costly to trace them.

What if we don't need to trace them?

If those objects are

immortal: They never die.
un-movable: Their addresses will not change.

then we can skip the object graph edges pointing to those objects during tracing. Slots that hold references to those objects don't need to be visited (because they are always alive) or updated (because they never move).

A separate concern is whether those singletons have mortal movable children because they may have mutable states represented as mortal objects, such as integers in Python. In this case, their children need to be traced. This is easy to solve. We use pinning roots to pin those singleton objects and keep them alive. Those objects will be scanned, and their children will be traced, too.

How do we identify such edges?

Such edges can be identified at different places.

When a slot is loaded

Depending on whether a VM binding chooses to enqueue object references or enqueue slots, a slot can be loaded in

Edge::load(): The slot is enqueued unconditionally, but when load() is called, the VM binding has a chance to identify whether it points to a singleton.
Scanning::scan_object_and_trace_edges: When this method scans an object, the VM binding loads from the slots, and can identify it.

When the VM binding has the value in the slot, it has several ways to identify those singletons.

Compare the reference against the address of each singleton object.
Compare the reference against an address range where those singletons are allocated.
Inspect the object header.
Inspect on-the-side metadata.
If the VM uses tagged references, look at the tags encoded in the slot value.
...

When MMTk core dispatches the reference to its space

The trace_object invocation starts from the SFTMap or the PlanTraceObject::trace_object method. They look for the space the object is in.

However, currently the ImmortalSpace still enqueues the object when traced.

To optimize for such singleton objects, MMTk core can introduce a special space which is only traced from root, or treat objects in it as roots. Such a space is ignored when traced from another object, and will not participate in the dynamic dispatch of SFTMap and PlanTraceObject.

The VMSPace and the boot image is similar to this. When using boot images, we may treat objects in the boot image as roots. We need to extend this to another space which is similar to ImmortalSpace, but allows objects to be dynamically allocated into it.

But if we reached here, we actually already visited the slot. But we can choose not to enqueue the object (not even on the first visit).

In `ActivePlan::vm_trace_object`

This only applies if the singletons are not in the MMTk heap. It is similar to the hypothetical immortal and rooted space mentioned before. The VM binding simply ignores such objects in vm_trace_object.

Is it worth the optimization?

If we need to check if the reference points to a singleton every time we load a reference from a slot (using Edge::load(), or during Scanning::scan_object_and_trace_edges()), it will add an overhead to every load. It will only be beneficial if many fields in the VM point to such objects, such as None. If very few fields point to those objects, the overall tracing speed may be slowed down by the overhead in each and every load.

Putting those singletons into a special space (or leaving them as off-heap objects) is beneficial if there are not many references to such objects. If a slot holds a reference to a singleton, it will be traced, and will still go through the dynamic dispatch. But in the end, it will not be enqueued, and it will not need to mark any mark bits. It may still be cheaper than marking (and forwarding) an ordinary object.

I am not sure how much benefit it has compared to using pinning roots to pin those singletons. Suppose those singletons are in the Immix space. The fields need to be traced, but since we already marked and pinned those objects in the PinningRootsTrace stage (right before the Closure stage), Immix::trace_object_{without_moving,with_opportunistic_copy} will find the object already marked and pinned, and will not forward nor enqueue the object. This only needs to look at the marking bit in non-defrag GC. We need to measure the overhead.

Just use tagged union!

If a VM can use tagged union, it is always better to use tagged union to represent those special objects.

Ruby, for example, uses this trick. On the Ruby language semantics level, every Ruby variable is a reference to an object. nil is an object, and it has methods. For example, nil.to_s == "", and nil.to_i == 0. But implement-wise, in CRuby, the VALUE type (typedef unsigned long VALUE;) is a tagged union of pointers and special values, such as nil, true, false and small integers. If the last three bits of a VALUE are all 0, but other bits are not all 0, it is a reference to a heap object. (Note: all zero means false) The raw value 4 represents the value nil, and the raw value 9 ((4 << 1) | 1) represents the actual integer 4 (a small integer, or "Fixnum"). When scanning a field, it is easy to use bit operations (the C macro SPECIAL_VALUE_P) to filter out slots that hold non-ref values. Because bit operation is cheap, it is practical to do it in Edge::load().

In theory, Julia and CPython can benefit from tagged union. But this is a very fundamental change to the VM. The entire VM (and third-party C extension modules) need to be aware of this change.

API change if we want to filter slots when loading

Suppose we decided to let Edge::load() and Scanning::scan_object_and_trace_edges skip edges that hold references to those singletons. We then need to ask ourselves one question:

"Is Scanning::scan_object, Edge::load() and Scanning::scan_object_and_trace_edges only used to support tracing?

No. The original purpose of scan_object and scan_object_and_trace_edges is enumerating slots. There is at least one other purpose: heap dumping. (See https://github.com/mmtk/mmtk-core/issues/803)

When traversing the heap to enumerate all objects, it probably doesn't matter.
When traversing the heap to enumerate the descendants of an object, we may need those singletons to be listed, too.
When traversing the heap for debug purposes, we may want to enumerate fields that point to those singletons, too.

But it also depends on the language. For example, for Ruby, ObjectSpace.reachable_objects_from(obj) does not list special values such as nil and small integers.

We may need to add another argument to Scanning::scan_object(), Edge::load() and Scanning::scan_object_and_trace_edges(). Let's call it may_elide_singleton_refs: bool. When true, the Edge::load() may return None if the slot holds a reference to a singleton, and Scanning::scan_object_and_trace_edges may skip slots pointing to singletons. But if it is false, both methods must visit those slots regardless whether they point to singletons.

What about reference counting?

We may define the reference-counting semantics of the objects in the hypothetical "ImmortalAndRootedSpace" such that those objects never have any inc/dec operations applied. In this way, it remains legal for Edge::load and Scanning::scan_object_and_trace_edges to skip fields pointing to those objects.

caizixian commented 10 months ago

This is a known problem I think.

See Plan.SCAN_BOOT_IMAGE in JikesRVM

https://github.com/JikesRVM/JikesRVM/blob/master/MMTk/src/org/mmtk/policy/ImmortalSpace.java

wks commented 10 months ago

This is a known problem I think.

See Plan.SCAN_BOOT_IMAGE in JikesRVM

https://github.com/JikesRVM/JikesRVM/blob/master/MMTk/src/org/mmtk/policy/ImmortalSpace.java

Yes. It is used for the VM space (which mainly serves the boot image) in JikesRVM. I think we may generalize it for other spaces, too. For example, Julia allocates those objects in MMTk heap during start-up, not in boot image.

mmtk / mmtk-core

Skip object graph edges to immortal non-moving objects when tracing #1076

Not traversing the entire object graph?

Global singleton objects

What if we don't need to trace them?

How do we identify such edges?

When a slot is loaded

When MMTk core dispatches the reference to its space

In `ActivePlan::vm_trace_object`

Is it worth the optimization?

Just use tagged union!

API change if we want to filter slots when loading

What about reference counting?

mmtk / mmtk-core

Skip object graph edges to immortal non-moving objects when tracing #1076

Not traversing the entire object graph?

Global singleton objects

What if we don't need to trace them?

How do we identify such edges?

When a slot is loaded

When MMTk core dispatches the reference to its space

In ActivePlan::vm_trace_object

Is it worth the optimization?

Just use tagged union!

API change if we want to filter slots when loading

What about reference counting?

In `ActivePlan::vm_trace_object`