Open wks opened 1 year ago
The new interface is introduced in this PR: https://github.com/mmtk/mmtk-core/pull/700
The Ruby binding is now able to process obj_ref and finalizers using this API since the following commits:
I have an experimental branch of mmtk-openjdk (https://github.com/wks/mmtk-openjdk/tree/gen-weakref-api). It copies the reference and finalizer processor from mmtk-core to mmtk-openjdk, and benchmarks (see https://github.com/mmtk/mmtk-core/pull/700) show that it is possible to implement reference processing in the binding and reach the same performance as what we currently have, and there is still room for improvement given the limitations (such as the use of mutex) in our current implementation.
From now on, we still need to deprecate the Java-style API in mmtk-core, and reimplement reference processing in OpenJDK in a way native to OpenJDK. Other VM bindings that are using the Java-style API should migrate to the new language-neutral API.
WARNING: This issue contains wild and crazy ideas.
Currently mmtk-core already has Java-style weak reference processors and finaliser processors. In https://github.com/mmtk/mmtk-core/issues/544, we discussed whether we should keep Java semantics. But as we start to support other languages and VMs, it is clear that we need to go beyond what's available in Java.
Update: After discussions, it is clear that this idea is not crazy. Besides the reasons provided below, another reason for supporting ref processing in bindings is that it will allow us to make apple-to-apple compare MMTk and the VM's own GC because both shall use the same reference processor.
Task list:
Other languages
Java (Yes. Java.)
In addition to
java.lang.ref.XxxxReference
and things implemented with them (such asWeakHashMap
which is implemented withWeakReference
), Java also has JNI weak handles which weakly refer to an object, but are not Java objects. The current weak ref processing mechanism cannot handle those weak handles.Ruby
ObjectSpace::WeakMap and WeakRef: In Ruby, the most basic programmer-visible weak data structure is the
ObjectSpace::WeakMap
type. It is a weak-key weak-map hash map. If either the key or the value is dead, the key-value pair is removed from the map. It is used to implement theWeakRef
type in the stdlib. It storesWeakRef
as the key and the referred object as value. If either theWeakRef
or the referred object dies, the association between them is removed. Under the hood,ObjectSpace::WeakMap
is implemented by adding finalisers on both the key and the value.Global internal data structures: Some internal data structures in Ruby has weak reference semantics. Those data structure holds per-object data for live objects, but can be cleaned up if the object dies.
obj.object_id
. The ID is guaranteed to be unique while the object is alive. Under the hood, the Ruby runtime maintains a global bidirectional ID-to-object and object-to-ID map. When an object is moved, thegc_move
function updates the bi-directional map; when an object dies, the finaliserobj_free
removes that object from the bidirectional map.T_OBJECT
have their instance variables held in an external table, and a global mapgeneric_iv_tbl_
maps each object to its "gen_ivtbl". When an object dies, its associated "gen_ivtbl" is freed.The "cleaned when object dies" semantics satisfies the definition of "weak reference". Actually, weak references are intended to be used to implement canonicalising mappings, as described in Java's documentation.
V8 and Ephemeron
V8 supports Ephemeron. Simply speaking, an ephemeron is a pair
If the object referred by the
key
is alive, thevalue
field behaves like a strong reference; otherwise thevalue
field behaves like a weak reference.Ephemeron behaves like
java.util.WeakHashMap
entries. If the key dies, the key-value pair is automatically removed from theWeakHashMap
. Under the hood, OpenJDK implements it by usingWeakReference
s to point to the key. When the key dies, theWeakReference
is enqueued, and theWeakHashMap
"expunges stale entries" from time to time. It is not as good as Ephemeron, though, because with native Ephemeron support, the GC can clear the value field directly.Why the current mechanism in MMTk core is not enough?
Different data structures
Different languages/VMs have different weak data structures.
Some of them are not heap objects. For example, JNI weak handles are not heap objects, but MMTk core's ReferenceProcessor assumes weak references are heap objects.
Some of them can hold multiple key-value pairs in one complex data structure. For example, in Ruby, the weak tables are hash tables implemented in C. They cannot be simply updated like the way GC updates fields when an object moves. If the hash table uses object address as the key, and the object is moved, then the table entry needs to be re-hashed because the key changed.
Different semantics
Ephemeron's unusual semantics that "when key dies, the value becomes weak" is not handled by existing things in Java.
Although both Java's
WeakHashMap
and Ruby'sObjectSpace::WeakMap
emulate ephemeron-like behaviour using finaliser, it is not as efficient as supporting Ephemerons directly in GC, because weak maps still briefly keeps the value "alive", while the "expunge stale entry" operations need to be executed at a later time.Proposed interface
Note: this may be crazyMaybe not that crazy. Wenyu is already doing something like this in the lxr branch of the mmtk-openjdk bindingMMTk core provides a reference processing stages
RefClosure
(replacing our currentXxxRefClosure
phase), during which two functions can be called:is_alive(ObjectAddress) -> bool
: Return whether an object is alive.is_reachable
should be a better name.trace_object(ObjectAddress) -> ObjectAddress
: Keep the object alive, trace that object, and return its new address (if moved).And the VMBinding provides one function to be executed by GC worker threads during the new
RefClosure
phase:Collection::do_ref_processing()
: Do whatever the VM needs to process weak refs. MMTk core may call this multiple times if the VM keeps additional objects alive viatrace_object
.MMTk doesn't care about what the VM do during
do_ref_processing()
.How to implement Java-style references
The VM binding maintains its own list of "candidate" and "finalized" object lists. During
do_ref_processing
, the VM binding inspects each candidate.How to implement Ephemeron
How to implement global maps in Ruby
Problems
Q: Can this be parallelised?
do_ref_processing
can create sub-tasks, while MMTk-core create multiple work packets under the hood.Q: How to support multiple strength levels (soft, weak, finalizer, phantom, ...)
do_ref_processing
multiple times, passing a integer parameter that indicates how many time MMTk has done the transitive closure. It is up to the VM binding to interpret the integer, for example, when n = 1, handle soft references; when n = 2, handle weak references, ...process_weak_refs
each time a transitive computing is finished. The VM binding can implement a state machine to handle a different strength each time.Q: This looks very unsafe. The VM can basically do anything here.
Update
Wenyu is already doing something similar in the lxr branch of mmtk-openjdk. https://github.com/wenyuzhao/mmtk-openjdk/blob/lxr/mmtk/src/reference_glue.rs#L243-L289
However, I think work packets (
GCWork
and the buckets) are an implementation detail of mmtk-core, and shouldn't be exposed to the VM binding (I am still open to objections for now). In my proposed API,trace_object
can be provided as a call-back closure that encapsulates the logic related to work packets, and the VMBinding only specify which object need to be kept alive.Update: In https://github.com/mmtk/mmtk-core/pull/700, we encapsulated
trace_object
behind theObjectTracer
trait (already exists for supporting object-enqueuing tracing), and the newObjectTracerContext
trait encapsulates the creation and flushing ofProcessEdgesWork
.