wks commented 11 months ago

Currently, the Barrier trait provides the substituting barrier, object_reference_write, which includes both the write operation to the slot, and the write barrier semantics (such as checking the unlogged bits for the object-remembering barrier). However, this is not enough.

Memory orders

The read and write operations may have acquire and/or release semantics. For example, in Java,

AtomicReference.set() has the strongest memory order, similar to seq_cst in C++14.
AtomicReference.setAcquire() has the memory order equivalent to release in C++14.
AtomicReference.setOpaque() and AtomicReference.setPlain() have weaker memory order.

Implementation-wise, those memory orders translate to different machine-level instructions.

Atomic read-modify-write operations

Atomic read-modify-write operations do both a read and (optionally) a write operation. For example, in Java,

AtomicReference.getAndSet(newValue) exchanges the reference in the slot with the reference in the argument.
AtomicReference.compareAndExchange{,Acquire,Release}(expectedValue, newValue) does strong CAS operation. It has several variants, with seq_cst, acquire and release semantics, respectively. It also has variants that do weak CAS operations.
AtomicReference.getAndUpdate(f) performs atomic update by calling the closure f which in theory can do anything. It may be implemented with CAS or LL/SC underneath.

Note that for compareAndExchange and getAndUpdate, the write operation may or may not be executed, depending on whether the actual value in the slot matches the expected value. The write barrier should only be executed if the write operation actually takes place.

Updating non-reference fields

The Sapphire algorithm needs write barriers for non-reference fields, too. Sapphire keeps two copies of objects during GC. Read operations read from either copy, but write operations need to write to both copies. This means not only reference fields, but all fields need to write to two different addresses for each language-level field update.

In order to support Sapphire, JikesRVM was refactored (https://github.com/mmtk/jikesrvm/commit/721ca5a27fa66c9d46a1d260b2b1a816edabfdee) to add write barriers for all field types in Java, but the default implementation (when barrier is not required for a type) is a simple memory write operation. The Sapphire plan overrides the barrier to handle the forwarding (see: https://github.com/perlfu/sapphire/blob/1424b489dc667f6080fc601df5437a3b7f87e828/MMTk/src/org/mmtk/plan/otfsapphire/OTFSapphireMutator.java#L1104).

JikesRVM's approach requires one function for each field type, but Java's field types are limited. There are only byte, short, int, long, char, boolean, float, double and Object. This is feasible for a Java-specific GC framework. But the Rust MMTk is designed to be language-neutral.

Conclusion

In conclusion, one single Barrier::object_reference_write method is insufficient to support the rich semantics the VMs support, and it may need refactoring. There are mainly two things to take care of:

MMTk cares about the old reference and the new reference of the field, without VM-specific tag bits, and does not care about non-reference values such as small integers held in the slots.
The VM cares about actually reading, writing, or performing atomic RMW operations on the slot.

So a substituting write barrier method Barrier::object_reference_write should have both of the two things as arguments. The former may be ObjectReference arguments (or data structures to obtain the old and new ObjectReference in the slot), and the latter may be a call-back (closure) for the VM to do the actual storing or atomic RMW operation.

wks commented 9 months ago

There are two kinds of write barriers.

The first kind is a call-back when the object graph is mutated. This kind of barriers only needs to be applied when updating a reference field. For example,

Deletion barriers runs when an old edge is removed (i.e. they work on the old target).
- The SATB barrier remembers the old target.
- The object-remembering barrier remembers the entire old object when the object is first modified. I think it is a deletion barrier, too, because it remembers the old target.
- The field-remembering barrier is similar.
Insertion barrier runs when a new edge is inserted (i.e. they work on the new target).
- The Steele style barrier reverts the source object when inserting an edge from a black object to a white object.
- The Dijkstra style barrier shades the target object when inserting an edge from a black object.
- The barrier used by generational GC remembers the new target (or the address of the slot) in order to maintain the remembered set.
Naive reference counting does decrement on the old target, and increment on the new target. So it is both a deletion barrier and an insertion barrier.

The second kind is a helper for the mutator to locate the field to update. This kind of barriers needs to be used when updating any fields, including reference fields and non-reference value fields. For example,

The Sapphire GC algorithm needs to write to both copies of an object during some phases of GC.
The Brooks's indirection barrier reads from the forwarding pointer field of an object to locate the actual copy of the object the write operation should write into. The Brooks's indirection barrier is required for both read and write operations. The Shenandoah GC algorithm used to use this barrier.

In today's meeting, @steveblackburn mentioned that this distinction is similar to the two kinds of read barriers. The first kind (the "loaded value barrier") is applied to the reference loaded out of the slot, and the second kind (the "use barrier") is applied when loading any field out of an object. The barrier that ZGC uses to clear tag bits after loading belongs to the first category, while the Brooks' indirection barrier belongs to the second category.

When redesigning the barrier API, it will be helpful to acknowledge the difference between those two kinds of write barriers. We should also be careful not to make the API too invasive. There are much less write operations than read operations (about 1:8 in some benchmarks), and there may be much more write operations to non-reference value fields than reference fields. We should make sure the VMs that don't need GC algorithms that requires the second kind of barrier don't have to slow down their write operations for non-reference fields.

wks commented 6 months ago

Some VMs have limits on the forms of write barriers.

Unknown slot address

For example, in CRuby, the Hash type is implemented with a third-party hash table library (public domain license) copied in to the Ruby source tree decades ago. Keys and values can be object references, but the table lookup, insertion and deletion operations are implemented with C functions that are unaware of GC. You can imagine the object is a wrapper of C++'s std::unordered_map<void*, void*>, and key-value pairs are inserted using the C++ expression map[key] = value. It is not practical for the VM to modify the hash table implemented in C, like it is not practical to modify std::unordered_map in C++ just to support GC. Therefore, the usual form of write barrier, i.e. write_barrier(object, slot, new_value) won't work for such implementations.

Instead, CRuby uses an alternative form of write barrier: rb_obj_written(obj, old_value, new_value), or equivalently in the form of an MMTk API:

fn object_reference_written(
    mutator: Mutator,
    object: ObjectReference
    old_value: ObjectReference,
    new_value: ObjectReference
);

CRuby calls rb_obj_written after it actually updated a field. Currently, CRuby uses an incremental generational mark-sweep GC. During mutator time, the barrier simply remembers the object if obj is old and new_value is young, and old_value is simply ignored; during incremental marking, the barrier marks new_value if object is black and new_value is white, and old_value is ignored, too,

Only remembering object

In other places, CRuby uses memcpy, memset or iterators implemented in C to initialize, update or reset complex C data structures (including arrays, hashes, etc.) and write multiple reference fields in one pass. In such cases, CRuby calls rb_gc_writebarrier_remember(obj), or equivalently in the form of an MMTk API:

fn object_modified(
    mutator: Mutator,
    object: ObjectReference
);

During mutator time, it remembers obj if obj is old. The children of obj will be marked on the next nursery GC. This is what MMTk's object remembering barrier does, too. During incremental marking, it remembers obj if it is black.

What API should MMTk core provide?

The remembering-only barrier object_modified(mutator, object) is obviously not general enough, even though it is sufficient to support all generational plans in MMTk core now, namely GenCopy, GenImmix and StickyImmix. We may provide it for VMs that have limitations, such as Ruby, but we need to clearly document it and let the user know its limitation. For the use case of bulk-updating an array, it is more advisable to use slot range barrier, instead.

The form with old value instead of slot address object_reference_written(mutator, object, old_value, new_value) is, in my opinion, general enough to cover barriers that operate on object graph mutations. Specifically, the old_value argument can support deletion barriers, while the new_value argument can support insertion barriers. Object-remembering barrier can also be supported by ignoring both old_value and new_value and simply remembering object.

Our current object_reference_write is more general in the sense that it gives MMTk-core control of whether to load the old value from the slot or not. But it is not general enough because it performs the write operation in mmtk-core, and is therefore unable to support tagged references, atomic RMW operations and atomic memory orders (see the main post https://github.com/mmtk/mmtk-core/issues/1038#issue-2025447072), and we haven't addressed indirection barriers (such as the old Shenandoah and Sapphire), yet.

qinsoon commented 6 months ago

We have seen similar cases with Julia as well. There is no slot in Julia's write barrier function, and there is one kind of barrier that has no target object (just remembering the source object). This is not an issue for our object remembering barriers, which just needs the source object. As long as we know (and assert) which kind of barrier is in use, we can omit some arguments that are not really in use for the barrier.

wks commented 6 months ago

... As long as we know (and assert) which kind of barrier is in use, we can omit some arguments that are not really in use for the barrier.

That's what I am doing for Ruby's write barriers at this moment. I just set the slot and the new_value arguments to NULL pointers and it still works. But that's a hack, and we need mmtk-core to provide proper API functions for them.

qinsoon commented 6 months ago

... As long as we know (and assert) which kind of barrier is in use, we can omit some arguments that are not really in use for the barrier.

That's what I am doing for Ruby's write barriers at this moment. I just set the slot and the new_value arguments to NULL pointers and it still works. But that's a hack, and we need mmtk-core to provide proper API functions for them.

Maybe it is like pinning. Not all the plans can support pinning. So if a VM has to pin objects, they can only use certain plans. If they have no such restriction or if they want to refactor to remove such restrictions, they can use all the plans and our API works fine with that. This also applies to the write barrier. Unless the VM fixes the restriction they introduce in the VM side, they can only use certain write barriers.

I am not saying that our write barrier API is fully general. But for those two cases ("unknown slot address" and "only remembering object"), it sounds like issues in the language implementation. We may want to document it more clearly, rather than changing our API for this reason.

mmtk / mmtk-core

Generalize the substituting write barriers #1038