wks commented 2 years ago

Proposed API:

UPDATED: changed from "declare object dead" to "clear VO bit".

/// Clear the valid-object bit.
/// Subsequent calls to `memory_manager::is_mmtk_object(object)` will return false.
/// Subsequent GC will not attempt to scan `object`.
fn clear_vo_bit(mmtk: &MMTK,  object: ObjectReference);

/// Set the valid-object bit.
/// After allocating an object, the VM should call this function to set the valid-object bit of the object.
///
/// TODO: Maybe we don't need to provide this as an API.
/// `post_alloc` can implicitly set the bit,
/// and it is also necessary to set this bit at the same time of `post_alloc`.
fn set_vo_bit(mmtk: &MMTK,  object: ObjectReference);

This API is only available when the is_mmtk_object feature is enabled.

TODO list

[ ] Rename "alloc bit" to "valid object bit" (or "VO bit" for short).
[ ] Refactor MallocSpace so that it depends on a policy-specific bitmap to decide if any address holds an object allocated by malloc. It shall be independent from the VO bit.

Rationale: Object finalization interfering with object scanning when using conservative stack scanning

In Ruby, a T_DATA object is a heap object that contains a pointer to off-heap C struct, and has customizable marking and freeing functions (dmark and dfree). What complicates things is, the off-heap C struct may contain references to other heap objects. Therefore, its dmark function usually contain code that accesses the off-heap C struct, and the dmark is called by GC to scan that object. Similarly, its dfree function may free the off-heap C struct and its sub-structures.

The following is simplified pseudocode in Rust:

typedef uintptr_t VALUE;

/// An object in the Ruby heap
struct RData {
    RBasic ruby_header;
    void (*dmark)(void*);
    void (*dfree)(void*);
    void *ptr;
}

/// An arbitrary C struct
struct FooC {
    int some_data1;
    double some_data2;
    VALUE some_ref1;
    VALUE some_ref2;
}

/// Finalizer for any Ruby object.  Called when any Ruby heap object dies.  Provided by Ruby VM.
void obj_free(VALUE *obj) {
    if (obj is a T_DATA) {
        ((struct RData*)obj)->dfree(((struct RData*)obj)->ptr);
    } else {
        ...
    }
}

/// Find references in FooC.  Provided by the C extension writer.
void fooc_mark(void* obj) {
    struct FooC *fooc = obj;
    rb_gc_mark(fooc->some_ref1);
    rb_gc_mark(fooc->some_ref2);
}

/// Free the FooC struct. Provided by the C extension writer.
void fooc_free(void* obj) {
    free(obj);
}

/// Create an object, including both the on-heap and the off-heap part.
VALUE make_fooc_as_rdata() {
    struct RData *on_heap = mmtk_alloc(...);
    struct FooC *off_heap = calloc(...);

    on_heap->dmark = fooc_mark;
    on_heap->dfree = fooc_free;
    on_heap->ptr = off_heap;

    return (VALUE)on_heap;
}

In vanilla Ruby, when GC decide that an object is unreachable, it calls obj_free on the object. For T_DATA objects, it calls the dfree in the object. And then it just returns the space of the on-heap object back to the free-list.

With MMTk, when a T_DATA is unreachable, it is resurrected and enqueued in the read-for-finalization queue. The mutator can poll from the queue, and call obj_free on that object after the GC. Ideally, the object dies now, and it should never be reached again.

However, since the stack scanning is conservative, the next GC may still find references to such finalized in-heap objects on the stack, for two reasons:

The finalizer function obj_free left a reference to the object on the stack, and got picked up during the next conservative stack scanning.
Some value on the stack happen to have the same bit pattern as the object reference to that object.

So during the next GC, the GC will try to scan it. Since it is a T_DATA, the GC will call its dmark function with its ptr as argument. However, ptr points to an object that is alread freed by dfree. dmark never expects it would be called on an object that is already finalized, and it accesses freed memory and crashes.

(UPDATED) The root cause of the problem

The problem manifests when both of the following conditions hold.

A function, usually finalizers, interferes with the scanning of an object, and
the object is reachable in the next GC.

In the case of Ruby,

obj_free makes the object un-scannable.
1. Some Ruby types (including both built-in types such as Array and Hash, and third-part types in the form of T_DATA) have attached off-heap malloc-ed memory that contains object references, and must be scanned.
2. When such objects are finalized using obj_free, the off-heap part is freed. When such an object is scanned again, the GC will load from freed memory.
3. The fact that Ruby's T_DATA has customizable dmark and dfree makes the situation worse. The VM developers have no control over third-party types.
With conservative GC, the stack may contain stale references to such objects, and the VM will add them to the stack root set, making them reachable by the GC.
- Even if a language has precise stack scaning, if GC is triggered after the function "messes up" an object, but before the local variable that holds a reference to this object is cleared, it will still be added to the root set.

This problem never happens in Java, because no Java function can interfere with object scanning.

How does vanilla Ruby avoid this problem?

Vanilla Ruby also uses conservative stack scanning. Vanilla Ruby uses free-list allocator. It decides whether a slot is free or is allocated by looking at the location of the object header. The header for all Ruby objects is:

struct RBasic {
    VALUE flags;
    const VALUE klass;
};

If flags is zero, the GC will consider the slot occupied by this object is a free slot; otherwise, some bits of flags will contain the type of the object (such as T_DATA).

When vanilla Ruby returns an object back to the free-list, it will clear its flags so subsequent GC will not consider it alive.

So the criteria for Ruby to decide, during conservative stack scanning, whether a word is a valid Ruby object are:

It is an aligned pointer that points to a slot in a free list, and
the flags of the object in the slot is not zero.

If both of them are true, that word is an object pointer.

How do I do it in mmtk-ruby?

My criteria for a word on the stack to be an object pointer are:

is_mmtk_object(obj) returns true, and
the flags of the object in the slot is not zero.

I replaced the first criteria with an MMTk call. When an object is resurrected and finalized, MMTk still considers that object as a valid MMTk object.

I also make sure the flags is cleared after calling the finalizer obj_free. By doing this, even if is_mmtk_object(obj) returns true, the conservative stack scanner will still refuse to report it to MMTk as a root if the flags is zero.

While that functions correctly just like vanilla Ruby, we can see the two criteria are redundant. We are looking at two places instead of one to decide if a word is a valid object reference.

Add an API to clear the alloc bit

If we add an API function to clear the alloc bit for obj, then is_mmtk_object(obj) will return false, and we don't need to look at its Ruby header any more.

We add a function clear_vo_bit(mmtk, obj). It clears the valid-object bit ("VO bit" for short. currently known as alloc-bit in mmtk-core). By clearing the VO bit, it has the following effects:

is_mmtk_object(obj) will return false. Conservative stack scanners use this API, and they will not put invalid objects into root set.
The GC will not attempt to scan invalid objects. This ensure if invalid objects can be transitively reachable from valid objects, mmtk-core can avoid scanning them.

Note that an invalid object can be transitively reachable from valid objects if multiple inter-connected objects die at the same time. If object A and object B contain references to each other, and both of them die at the same time, they are both added to the ready-to-finalize queue. If the mutator only finalized one of them, the other is still a valid object, and is alive (because finalizer is yet to be called). So it is not always possible to avoid "valid object pointing to invalid object".

To counter the cyclic dead objects problem, the GC should look at the VO bit before attempting to scan the object. By doing this, the VM doesn't even have a chance to look at the header because the GC never calls Scanning::scan_object if the VO bit is cleared.

What if the object can be resurrected?

In short, if an object can be resurrected, the VO bit should not be cleared, and the VM must ensure the object remain valid after finalization (because it is resurrected).

Ruby never resurrects objects, at least not on the Ruby semantics level.

Currently I implement Ruby-style obj_free with Java-style object resurrection. In fact, the dfree function provided by the C extension writer is written in C, and has access to the entire Ruby VM state. So in theory, the dfree function can store the object somewhere in the VM which is considered part of the root, and the object would become accessible again. However, in the Ruby community, it is considered a mistake if the programmer makes an object accessible after obj_free. So it is at least sound to use this API for Ruby.

Java may resurrect objects, but in Java, object scanning is not customisable by the programmer. If a Java object has an associated C structure, the common practice is to use a long nativePointer; field to hold the native pointer. It may call into JNI to free the object pointed by nativePointer, but this happens during finalize(), not during GC. Therefore, even a resurrected object is reached by GC again, the finalize() method is never called twice. Therefore, JVM never needs this clear_vm_bit API. p.s. Since finalize() is deprecated for removal since Java 18, the modern approach is to use Cleaner which is based on PhantomReference, which never resurrects objects in the first place.

Alternative solution in Ruby

Ruby VM could, in theory, clear the RData::ptr field so obj_free can refuse to call dmark if ptr==NULL. But this is not what the vanilla Ruby does, and it is slower than clearing the flags header.

qinsoon commented 2 years ago

I don't think this is a good idea. MMTk is a garbage collector, and it is MMTk's responsibility to determine which objects are alive, and which are dead in a GC. It makes little sense that a runtime can tell MMTk which object is dead through an API.

For your example, from MMTk's point of view, the object is kept alive in a GC, and the object will be considered as live until the next GC finds the object is unreachable. But the binding gets the object, wipes the object content, and asks MMTk to trace/scan the broken object. To me, this is completely the binding's wrong behavior.

However, we probably want to make it clear that is_mmtk_object() or any API call about querying object state is based on what MMTk knows from the last GC. If there is anything that changes the object state from the last GC (i.e. in the mutator phase), MMTk will not know about it.

qinsoon commented 2 years ago

We had some discussion about this:

Instead of directly 'declaring an object dead', the API should be clearing the valid object bit.
The old 'alloc bit' is now 'valid object bit' (VO bit). It gets set when an object is allocated, and by default gets cleared when an object is reclaimed. But it could be manually cleared by the binding, thus it is not guaranteed that when we reclaim an object's memory, its valid object bit is true. (we need to remove our assertions on this.)
We assume a binding should not let us trace or scan an object with VO bit unset (we can make assertions on this).
The main purpose for the VO bit is conservatism.
The VO bit can also be used to find base pointers for internal pointers.
Mark compact could still use the valid object bit for linear scanning.
Malloc mark sweep will need another policy/local bit to know where malloc results are (as VO bit may get cleared early).

wks commented 2 years ago

There is one more problem. If multiple inter-connected objects die at the same time, they will all be enqueued for finalization. However, before the next GC, the mutator may only have time to execute the finalizer for some of the object but not all. Even if we clear the VO bit of the objects that we finalized, other read-to-finalize objects are still added to the root set, and they can still reach the "invalid" objects.

Ruby's approach doesn't have this problem, because it changes the type information of the object so that even if such objects are scanned, the scanning function can do a no-op.

I am OK with doing both (1) clearing the VO bit, and (2) clearing the object header.

qinsoon commented 2 years ago

I am OK with doing both (1) clearing the VO bit, and (2) clearing the object header.

If you have to do (2), does that mean (1) is not necessary?

steveblackburn commented 2 years ago

The issue with (2) is that it is not sound w.r.t. the "valid object" semantics. Some random memory may return a false positive.

steveblackburn commented 2 years ago

Separate point:

The API should include symmetric functions to set and clear the vob: set_vob(), clear_vob(). In a non-optimized allocation implementation, the binding would call set_vob() at allocation time. In an optimized case, the code would be 'lifted' into the runtime's allocation sequence, just like for write barriers and other allocations, the non-optimized case should always be available in the API and should define the semantics. The optimized version in the runtime binding must reflect those semantics.

wks commented 2 years ago

@qinsoon (1) is not necessary. Vanilla Ruby does only (2), and will refuse to report an object as root if the header is cleared.

But having (1) allows a stack word to be filtered faster when calling is_mmtk_object, so the stack scanner doesn't have to look at the object header.

wks commented 2 years ago

The issue with (2) is that it is not sound w.r.t. the "valid object" semantics. Some random memory may return a false positive.

If the "valid object" semantics means invalid objects cannot be reachable, then it may be impossible to maintain this semantics if some finalizable objects form cycles. If object A and B form a cycle, and they are both finalizable, and they die at the same time, then both of them are subject to finalization. If we call obj_free on one of them and make it "invalid", the other object remain "valid" at this time, and can still reach the invalid object.

qinsoon commented 2 years ago

The issue with (2) is that it is not sound w.r.t. the "valid object" semantics. Some random memory may return a false positive.

The fact that (2) is still necessary could be an indication that our old semantics about 'alloc bit' is sufficient enough for Ruby, and letting MMTk know about invalid objects will not solve the problem for Ruby.

wks commented 2 years ago

@qinsoon You are right. The existing 'alloc bit' API (is_mmtk_object) is sufficient to support Ruby.

This proposed API change is an optimization so that we can rule out "invalid" objects from root set without looking at the header. This doesn't prevent "invalid" objects from being transitively reached, and it is why it is still necessary to clear the header, too. @steveblackburn may have a stronger opinion on this.

wks commented 2 years ago

From our discussion in the evening, the "cyclic dead objects" problem can be solved by requiring the GC to inspect the VO bit before attempting to scan an object.

Alternatively, the VM can skip an object in Scanning::scan_object by looking at the header and skip T_NONE objects. The VM or the GC has to look at some metadata to decide whether the object is valid or not, given that it is allocated.

Vanilla Ruby never sees invalid objects during tracing, because it calls finalizers during GC (in the sweeping phase), and does not postpone it to the mutator time. But that's too invasive for GC.

mmtk / mmtk-core

Add an API to clear the valid-object bit (a.k.a. alloc-bit) #648

Proposed API:

TODO list

Rationale: Object finalization interfering with object scanning when using conservative stack scanning

(UPDATED) The root cause of the problem

How does vanilla Ruby avoid this problem?

How do I do it in mmtk-ruby?

Add an API to clear the alloc bit

What if the object can be resurrected?

Alternative solution in Ruby