mmtk / mmtk-core

Memory Management ToolKit
https://www.mmtk.io
Other
379 stars 69 forks source link

Custom tracing unit/packet #1137

Open wks opened 6 months ago

wks commented 6 months ago

This is one way to implement https://github.com/mmtk/mmtk-core/issues/710

What is a custom tracing unit?

It is a unit of work that processes multiple object graph edges. Examples include

The common part is that all of them need to call trace_object to trace the edges, and create ScanObjects work packets for newly visited children.

What's not common is that the "unit" can be small or large. It can be one single slot, and it can be multiple objects to be scanned, and it can be a whole stack.

In theory, Edge (Slot) and MemorySlice are custom tracing units

Yes. But we don't want to replace them yet. They work pretty well in MMTk for now.

Representation of a custom tracing unit

As a closure

The simplest way to represent such a thing is a FnOnce(impl ObjectTracer), or a trait like this:

trait CustomScannableUnit {
    fn run(self, object_tracer: &mut impl ObjectTracer);
}

That is, it is a runnable thing that can contain arbitrary data as context, and it uses a ObjectTracer (which provides trace_object when running.

But the key point is that it is not given a reference to ObjectTracer until it is executed. This is important because we can only call trace_object at certain times, such as TPinningClosure, PinningRootsTrace, Closure, and *RefClosure. Importantly, we cannot call trace_object in Prepare.

As a work packet

Because it is a unit of work, we can wrap it in a work packet.

But because a "custom tracing unit" can be small, we can pack multiple such units into one packet.

In fact, "custom tracing units can be nested. One big unit can contain multiple small units. For example, we can aggregate 4096 Edge (Slot) instances into one work packet and process them in one go. (That's what our ProcessEdgesWork currently does.) We can put one whole stack into one "custom tracing unit", and it can be further split into the scanning of each stack frame.

Why is it useful?

It complements our current root-scanning mechanisms.

Currently VM bindings deliver a list of roots edges to mmtk-core as either a Vec<Edge> (Vec<Slot>) or a Vec<ObjectReference> (a list of target objects which need to be pinned). Only Edge (Slot) can be updated. That's not general enough. Some VMs, such as Ruby and Android, cannot represent some root edges as Edge (Slot). Those VMs need to access trace_object directly.

Instead, we can let the VM deliver a custom tracing unit for a subset of global roots, such as one stack. We introduce an extra method

trait RootsWorkFactory {
    /// Create a work packet which will be executed in the `Closure` bucket.
    /// When executed by a worker, the worker will instantiate `OT` and call `callback` with a reference to it.
    /// Newly visited object from `OT` will be added to a `ScanObjects` work packet in the `Closure` bucket.
    fn custom_tracing_unit<OT: ObjectTracer>(callback: impl FnOnce(&mut OT));
}

Calling custom_tracing_unit does not create an ObjectTracer immediately, but it creates a work packet which will be executed in Closure. When that work packet is executed, it creates an OT instance using the current ProcessEdgesWork implementation selected by the current GC, call the callback with a reference of OT, and then flush it.

For example, the Ruby VM binding can call custom_tracing_unit with a callback that calls gc_update_references. gc_update_references will call trace_object to update the roots and assign the updated object references back to the root fields.

As another example, the Andorid ART can call custom_tracing_unit with a callback that scans the stack. It uses whatever ART provides to identify reference slots on the stack and call trace_object to update them. Note that this happens in Closure. Although we usually scan stacks in Prepare, it is OK to do it in Closure because it is still enough to keep the objects pointed by root edges alive.

It helps scanning complicated objects.

Many objects in Ruby are implemented as off-heap C objects, and are scanned using functions with statements like obj->field = trace_object(obj_field). (See https://github.com/mmtk/mmtk-core/issues/710 for more details). Currently, we use Scanning::scan_object_and_trace_edges to trace all edges logically starting from one object (that includes all fields of the in-heap part of the object, and the fields in off-heap structs, too). It's problematic if an object involves many off-heap objects. That usually makes the ScanObjects work packet too large to parallelize properly.

With custom tracing units, we can offload each native struct to one separate custom tracing unit, and they can be split into multiple work packets. What we need is something similar to RootsWorkFactory::custom_tracing_unit, but callable during tracing.

Related issues

https://github.com/mmtk/mmtk-core/issues/710 raised the need of letting the VM call trace_object directly. This issue drafts one possible implementation of it. One challenge discussed in https://github.com/mmtk/mmtk-core/issues/710 is limiting the scope of trace_object so that it can only be called at the right time (from TPinningClosure to VMRefClosure). The solution in this issue does not give ObjectTracer to the VM binding directly, but only lends it to the binding when executing the work packet.