wks commented 1 year ago

TL;DR: We need to expose the trace_object function to the VM binding for global roots scanning (Ruby needs it) and weak reference processing (VM-side ref processing needs it). However, trace_object depends on a queue (to enqueue objects) and a GCWorker instance (to copy object). Care needs to be taken so that VM bindings can share it with multiple threads (or work packets). We also need to make sure the proposed new interface is general enough to support different kinds of GCs, including concurrent GC and reference counting.

Why the VM binding needs `trace_object`?

Ruby, copying GC and global roots

We discussed before that Ruby scans objects by providing C functions that enumerate and update fields. See: https://github.com/mmtk/mmtk-core/issues/581

And it is similar for global roots. Ruby has two functions: gc_mark_roots and gc_update_references.

gc_mark_roots marks global roots. It calls rb_gc_root(var) on each root variable.

static void
gc_mark_roots(rb_objspace_t *objspace, const char **categoryp)
{
    // ...
    rb_vm_mark(vm);
    // ...
}

void
rb_vm_mark(void *ptr)
{
    // ...
        rb_gc_mark_movable(vm->load_path);
        rb_gc_mark_movable(vm->load_path_snapshot);
        RUBY_MARK_MOVABLE_UNLESS_NULL(vm->load_path_check_cache);
        rb_gc_mark_movable(vm->expanded_load_path);
        rb_gc_mark_movable(vm->loaded_features);
        rb_gc_mark_movable(vm->loaded_features_snapshot);
        rb_gc_mark_movable(vm->loaded_features_realpaths);
    // ...
}

gc_update_references updates the fields.

static void
gc_update_references(rb_objspace_t *objspace)
{
    // ...
    rb_vm_update_references(vm);
    // ...
}

void
rb_vm_update_references(void *ptr)
{
    // ...
        vm->load_path = rb_gc_location(vm->load_path);
        vm->load_path_snapshot = rb_gc_location(vm->load_path_snapshot);

        if (vm->load_path_check_cache) {
            vm->load_path_check_cache = rb_gc_location(vm->load_path_check_cache);
        }

        vm->expanded_load_path = rb_gc_location(vm->expanded_load_path);
        vm->loaded_features = rb_gc_location(vm->loaded_features);
        vm->loaded_features_snapshot = rb_gc_location(vm->loaded_features_snapshot);
        vm->loaded_features_realpaths = rb_gc_location(vm->loaded_features_realpaths);
    // ...
}

Currently, we hijack rb_gc_mark to record the values of root variable, and present it to MMTk core with RootsWorkFactory::create_process_node_roots_work. MMTk core receives a list of objects so that they can pin them, and it is never necessary to update root variables because they are pinned.

However, to support copying GC in Ruby, roots need to be updated, too (unless we are willing to pin all global roots for the ease of implementation). Because of the var = rb_gc_location(var) idiom in Ruby, the easiest way to support updating roots is replacing rb_gc_location with trace_object. This is impossible with the current RootsWorkFactory API because it only has two methods:

pub trait RootsWorkFactory<ES: Edge>: Clone + Send + 'static {
    fn create_process_edge_roots_work(&mut self, edges: Vec<ES>);
    fn create_process_node_roots_work(&mut self, nodes: Vec<ObjectReference>);
}

create_process_edge_roots_work gives MMTk core a list of Edge, and Edge is usually a pointer to a root variable. create_process_node_roots_work gives MMTk core a list of ObjectReference, and it inevitably pins all the objects because it cannot update the roots.

VM-side weak reference processing

Different VMs implement weak references, weak tables, finalisers and ephemerons differently. They have different layout and semantics. The most general way to support different VMs is to provide some kind of primitive, and let the VM binding scan and update weak references.

In https://github.com/mmtk/mmtk-core/pull/700, I designed a new API that gives the VM binding temporary access to the trace_object function.

pub trait ProcessWeakRefsTracer {
    fn trace_object(&mut self, object: ObjectReference) -> ObjectReference;
}

The access is "temporary" because MMTk core prepares a ProcessEdgesWork, and wraps it into a ProcessWeakRefsTracer that borrows the ProcessEdgesWork.

        let mut process_edges_work = E::new(vec![], false, mmtk);
        process_edges_work.set_worker(worker);
        // ...
        let tracer = SimpleProcessWeakRefsTracer {
            process_edges_work: &mut process_edges_work,
        };
        <E::VM as VMBinding>::VMCollection::process_weak_refs(worker.tls, context, tracer)

From my experiment, this API is able to implement JikesRVM-style reference processing as implemented in mmtk-core, and it is able to support Ruby by updating weak tables of obj_free candidates, finalisable objects, as well as hash tables that map object addresses to GenIVTbl, ID and other things. This means even temporary access to trace_object is enough for VM bindings to handle weak references.

Why is temporary access to `trace_object` not enough?

If ProcessWeakRefsTracer borrows a ProcesssEdgesWork, then the Collection::process_weak_refs(tls, context, tracer) function will only be able to use the tracer instance in its function scope. It forbids, for example, creating more work packets and calling ProcessWeakRefsTracer::trace_object from other work packets because it will violate the borrowing rules.

Similar is true for root scanning. If scan_vm_specific_root has temporary access to trace_object, it will not be able to spawn more work packets and scan roots in parallel. That was the reason why RootsWorkFactory requires the Clone trait, because VM bindings can clone() the RootsWorkFactory and scan it in multiple work packets.

The approach of the lxr branch

In the lxr branch, the Collection trait has the following methods:

    fn process_soft_refs<E: ProcessEdgesWork<VM = VM>>(_worker: &mut GCWorker<VM>) {}
    fn process_weak_refs<E: ProcessEdgesWork<VM = VM>>(_worker: &mut GCWorker<VM>) {}
    fn process_final_refs<E: ProcessEdgesWork<VM = VM>>(_worker: &mut GCWorker<VM>) {}
    fn process_phantom_refs<E: ProcessEdgesWork<VM = VM>>(_worker: &mut GCWorker<VM>) {}

Note that those functions expose the E: ProcessEdgesWork type to the VM binding.

The VM binding is able to spawn multiple work packets to process "discovered lists" in parallel.

    pub fn process_lists<E: ProcessEdgesWork<VM = OpenJDK>>(
        &self,
        worker: &mut GCWorker<OpenJDK>,
        rt: ReferenceType,
        lists: &[DiscoveredList],
        clear: bool,
    ) {
        let mut packets = vec![];
        for /* ... */ {
            let w = ProcessDiscoveredList {
                list_index: i,
                head,
                rt,
                _p: PhantomData::<E>,
            };
            packets.push(Box::new(w) as Box<dyn GCWork<OpenJDK>>);
        }
        worker.scheduler().work_buckets[WorkBucketStage::Unconstrained].bulk_add(packets);
    }

Note that the process_lists method also has the <E: ProcessEdgesWork> type parameter. As a result, the ProcessDiscoveredList<E: ProcessEdgesWork<VM = OpenJDK>> is specialised to the E type, too. Then it can gain access to trace_object by instantiating E:

impl<E: ProcessEdgesWork<VM = OpenJDK>> GCWork<OpenJDK> for ProcessDiscoveredList<E> {
    fn do_work(&mut self, worker: &mut GCWorker<OpenJDK>, mmtk: &'static MMTK<OpenJDK>) {
        let mut trace = E::new(vec![], false, mmtk);
        trace.set_worker(worker);
        // ...
                let forwarded = trace.trace_object(referent);
        // ...
        trace.flush();
    }
}

While exposing E to the VM binding works, I think it is in-elegant. As discussed in https://github.com/mmtk/mmtk-core/issues/604, the weak reference processor is not really using the whole ProcessEdgesWork. The vec![] above is assigned to the edges list, and it remains empty because it is never used as "processing edges". The weak reference processor is actually using the trace_object part, and its ability to create ScanObjects work packets (using trace.flush()) so that it can expand the transitive closure.

Proposed API

I am currently thinking about designing a trait that encapsulates just that.

trait Tracer {
    /// Create a new instance.  It will borrow a `GCWorker` instance in its lifetime.
    fn new(worker: &mut GCWorker) -> Self;
    /// The `trace_object` interface we wanted.
    fn trace_object(&mut self, object: ObjectReference) -> ObjectReference;
    /// Flush the internal queue and create ScanObjects work packet.  Consumes self.
    /// Alternatively we can use the `Drop` trait.
    fn flush(self);
}

One design goal is to make it compatible with ProcessEdgesWork so we can implement it now without much refactoring to mmtk-core. It can be implemented by wrapping a ProcessEdgesWork inside. Like ProcessEdgesWork, it has a new method to create new instances.

In https://github.com/mmtk/mmtk-core/pull/700, I mentioned that exposing set_worker and flush to VM binding may be inelegant, as it complicates the API. But after a second thought, I think we can't avoid associating a GCWorker to the Tracer object because trace_object will access GCWorker (more precisely, the CopyContext local to the worker).

Example

Collection::process_weak_refs will provide a type parameter instead of an impl ProcessWeakRefsTracer.

impl Collection for OpenJDK {
    fn process_weak_refs<T: Tracer>(
        worker: &mut GCWorker, // This may look strange because we pass `worker.tls` in other methods.
        context: ProcessWeakRefsContext)
    {
        let tracer = T::new(worker);
        for weakref in LIST_OF_WEAKREFS {
            if is_reachable(weakref.referent) {
                weakref.referent = tracer.trace_object(weakref.referent);
            }
        }
        tracer.flush();

        let work = SomeWorkPacket::<T>::new(); // Specialise new work packet with <T>
        add_work_packet(work);
    }
}

impl<T: Tracer> GCWork for SomeWorkPacket {
    fn do_work(&mut self, worker: &mut GCWorker<E::VM>, mmtk: &'static MMTK<E::VM>) {
        let tracer = T::new(worker); // Instantiate T
        // ...
        another_weakref.referent = tracer.trace_object(another_weakref.referent);
        // ...
        tracer.flush();
    }
}

Refactoring `ProcessEdgesWork`

A more ambitious goal is to refactor ProcessEdgesWork itself and split it into two parts:

An edge list, and
A tracer as shown above.

So that a ProcessEdgesWork can be implemented as iterating through the edge list, and feeding edges into the tracer. Then we don't need to pass ProcessEdgesWork everywhere, and use it only internally.

What about reference counting?

The Tracer trait shown above is provided to scan_vm_specific_roots and process_weak_refs, which should be part of tracing GCs instead of reference counting GCs. But deferred reference counting, when scanning stacks, may only apply DECs and INCs. We need to be careful of what is the expectation (i.e. what's the VM binding's obligation) when mmtk-core calls Collection::scan_stack_roots. We need to discuss this.

wks commented 1 year ago

PR https://github.com/mmtk/mmtk-core/pull/700 has been merged and we now have a general language-independent weak reference processing mechanism. It gives the binding a ObjectTracerContext which can be used to temporarily gain access to the trace_object method.

The problem with global roots scanning remain unsolved. We cannot call trace_object in the Prepare bucket. The Prepare bucket contains work that pin objects, so we cannot trace edges until it is clear whether any object is pinned or not during this GC. With the advent of "red roots" in https://github.com/mmtk/mmtk-core/pull/897, trace_object calls that may potentially move the object must be put into the regular Closure bucket instead of the dedicated ImmovableClosure bucket.

One solution is that we allow the VM binding to create work packets that can use trace_object. Such work packets must be parameterised with the concrete plan (more precisely, parameterised with the concrete ProcessEdgesWork instance that currently implements trace_object). Currently, this parameter is encapsulated in the implementations of RootsWorkFactory trait and the implementations of the ObjectTracerContext trait.

scan_vm_specific_roots gives the VM binding a RootsWorkFactory implementation which creates ProcessEdgesWork and ScanObjects work packets defined in MMTk core. However it contradicts with my hypothesis that the custom root-scanning packet should be designed and created by the binding. We need an way so that the custom root-tracing (not root-scanning) work packet should gain access to an ObjectTracerContext when executed.

The simplest solution is just adding a method RootsWorkFactory::get_object_tracer_context(&mut self) -> Box<dyn ObjectTracerContext> ("Box" and "dyn" are optional. We'll see if they are really necessary). But we must document it to make it clear that the returned ObjectTracerContext must not be used until the Closure stage. This part may be confusing, and will require the binding to use it in a disciplined way.

Example:

impl Scanning for Ruby {
    fn scan_vm_specific_root(factory: impl RootsWorkFactory) {
        let context = factory.get_object_tracer_context();  // Don't use it now.  Objects are not pinned yet.
        let packet = ProcessRubyVMRoots { context }; // Add this "context" to the work packet
        add_work_packet(packet, WorkBucketStage::Closure);
    }
}

struct ProcessRubyVMRoots { context: Box<dyn ObjectTracerContext> }
impl GCWork for ProcessRubyVMRoots {
    fn do_work(&mut self, worker: &GCWorker, mmtk: &MMTK) {
        context.with_tracer(worker, |tracer| { // so that we can use `trace_object` when executing this packet.
            let ruby_vm = ...;

            // NATIVE CODE: The following are implemented in equivalent C code which calls back to Rust for accessing `trace_object`
            *ruby_vm.field1 = tracer.trace_object(ruby_vm.field1);  // The compiler cannot inline methods of Box<dyn...>
            *ruby_vm.field2 = tracer.trace_object(ruby_vm.field2);  // So we may switch to `impl ...` if possible.
            *ruby_vm.field3 = tracer.trace_object(ruby_vm.field3);
            // ...
            // END of NATIVE CODE

        });
    }
}

k-sareen commented 11 months ago

I tried to expose trace_object to the VM during roots scanning via both: exposing an ObjectTracerContext in RootsWorkFactory; and exposing ProcessEdgesWork to the VM directly. The first one was very difficult to make the Rust compiler happy hence I tried the second method. I got further in the second attempt but then Rust complained about ProcessEdgesWork etc. I don't remember the errors from the top of my head, I'll edit this comment when I have time to add more context.

The real issue is that trace_object is so deeply entrenched with the concept (and context) of a work packet, that we can't just call trace_object directly. The requirements for executing trace_object directly are:

A GC worker to execute the function. Since the roots scanning work packet already uses a GC worker, we can re-use it to execute trace_object
An object queue associated with a GC worker. This is to enqueue the object for future reference scanning

Theoretically we already have all of this in the roots scanning work packet, but they're not exposed to the VM.

Perhaps an idea could be to have a global trace_object function like:

fn trace_object<VM: VMBinding>(worker: &mut GCWorker<VM>, object: ObjectReference) -> ObjectReference {
  // 
}

Or even as function of a GCWorker (I know this is really bad in terms of separation of concerns/abstraction leakage though):

impl<VM: VMBinding> GCWorker<VM> {
[...]

  fn trace_object(&mut self, object: ObjectReference) -> ObjectReference {
    //
  }

[...]
}

so that a VM can arbitrarily call trace_object on a given object if it has a valid GC worker. But we need to be careful we don't allow breaking work-packet dependencies by exposing these functions.

wks commented 6 months ago

I tried to expose trace_object to the VM during roots scanning ...

This may be a problem because we can't call trace_object during Prepare anyway.

The real issue is that trace_object is so deeply entrenched with the concept (and context) of a work packet, that we can't just call trace_object directly

Exactly. trace_object is currently provided by ProcessEdgesWork. A work packet that calls trace_object is either a ProcessEdgesWork itself, or having E: ProcessEdgesWork as a type parameter. That means we can't create a work packet out of thin air and let it call trace_object.

I am having a solution in mind, and it is described in details in https://github.com/mmtk/mmtk-core/issues/1137. The basic idea is, it does not expose trace_object during Prepare by giving an ObjectTracer or ObjectTracerContext to the VM binding. Instead, it makes a "promise" that "Just give me a callback, and I will give you a &mut ObjectTracer when it is executed". That gives mmtk-core flexibility to instantiate ObjectTracer at the right time. Because the callback will always be executed by a worker, the VM binding doesn't need to worry about finding a worker so that it can call trace_object. The object queue will be instantiated in ProcessEdgesWorkTracerContext::with_tracer. (Actually the object queue is part of ProcessEdgesBase.)

mmtk / mmtk-core

Expose `trace_object` to the VM binding #710

Why the VM binding needs `trace_object`?

Ruby, copying GC and global roots

VM-side weak reference processing

Why is temporary access to `trace_object` not enough?

The approach of the lxr branch

Proposed API

Example

Refactoring `ProcessEdgesWork`

What about reference counting?

mmtk / mmtk-core

Expose `trace_object` to the VM binding #710

Why the VM binding needs trace_object?

Ruby, copying GC and global roots

VM-side weak reference processing

Why is temporary access to trace_object not enough?

The approach of the lxr branch

Proposed API

Example

Refactoring ProcessEdgesWork

What about reference counting?

Why the VM binding needs `trace_object`?

Why is temporary access to `trace_object` not enough?

Refactoring `ProcessEdgesWork`