mmtk / mmtk-core

Memory Management ToolKit
https://www.mmtk.io
Other
364 stars 67 forks source link

Supporting disjoint objects #656

Open wks opened 1 year ago

wks commented 1 year ago

Updates

Recent discussions changed my initial thoughts. I list the current status in this section.

Concepts

An implication of the retain and reclaim semantics is that

To summarise in simpler words:

Implementation

Allocation

Both objects and buffers are allocations. Buffers are allocated using the same Mutator::alloc function. But we either do not need post_alloc, or we use a different post_alloc.

Retaining and copying buffers

When scanning an object, the VM identify pointers in the header object that points to buffers. mmtk-core provides an API for the VM. The signature of the function looks like this:

/// `buffer`: The address of the buffer
/// `size`: The size of the buffer
/// `alignment`: The alignment of the buffer
/// Returns: The new address of the buffer.
fn retain_buffer(buffer: Address, size: usize, alignment: usize) -> Address;

The VM needs to provide the size and the alignment to MMTk core because MMTk core cannot get those information from the buffer alone (as there is no such information in the buffer). If it is copying GC, MMTk core will use that information to copy the buffer. The buffer is copied like copying ordinary objects, via ObjectModel::copy provided by the VM. The new address of the buffer is returned.

Question: Should we provide another ObjectMode::copy_buffer? ObjectModel::copy takes an ObjectReference which assumes it is a reference to an object, not a buffer.

However, if the VM doesn't want to retain the buffer, it simply ignore the buffer, and MMTk will treat the buffer as dead.

Original thoughts

The rest of this post are my initial thoughts

"Naked" objects

Not all objects are created equal. In some virtual machines, some objects are wholly owned by other objects. For example:

Given that more than one VM have such objects, it may be worth adding support to such "naked" objects in mmtk-core.

Primary and subsidiary ("naked") objects

An object can be either primary or subsidiary. Primary objects are the objects we know before. Subsidiary objects are what we called "naked" objects.

Both kinds of objects are allocated in the GC heap.

Their differences are,

Difference from vanilla Ruby's buffers

A difference between the "subsidiary" objects defined here and the buffers of Array and String in vanilla Ruby is that both the primary and the subsidiary objects defined here are managed by the MMTk GC, while Ruby's buffers are allocated by malloc, and are freed by finalizers (obj_free).

Object graph and subsidiary objects

An object graph contains nodes and edges.

Without subsidiary objects, all object are primary. A node is a (primary) object. An object contains many reference fields, each of which represents an edge to another objects.

With primary and subsidiary objects, a node is an object group, i.e. one primary object plus all subsidiary objects it owns. Reference fields in both the primary and its subsidiary objects are the edges of the node. Edges only point to primary objects. The pointer from a parent to a subsidiary is not considered an edge in the object graph -- it's internal to a node.

Opportunity of object merging/splitting during copying

During copying, the GC has the opportunity to resize objects, and the opportunity to merge or split objects in a group.

For example,

As in the current MMTk interface, the VM is responsible for copying objects during copying GC. Some VMs are already using this opportunity to implement address-based hashing. We can extend this mechanism and let the VM decide whether to resize, split or merge objects.

However, in concurrent copying GC, it is the VM's responsibility to handle the synchronization between the mutator and the GC.

steveblackburn commented 1 year ago

A few comments about the abstractions...

Summarizing the terminology:

It is not our goal to implement Ruby's disjoint objects---one of the primary reasons for moving to MMTk is to avoid the limitation in Ruby that requires disjoint objects.

Languages will likely use disjoint objects to implement arrays that may be resized.

wks commented 1 year ago

@steveblackburn What name should we give to the thing returned by alloc? It is "a region of memory where the VM is allowed to write in, and is managed by the garbage collector". Both "parent objects" and "buffers" are allocated this way. (Or, should they?) If they are, there should be a concept that is a union of "(parent) objects" and "buffers"

qinsoon commented 1 year ago

I suggested using a special policy for buffers. I think it does not conflict with Steve's description about the disjoint objects. It is indeed one way to implement disjoint objects. Instead of having each policy deal with the special buffer object, we could make it only known to the buffer policy. The following is a comparison of implementing buffers in a special policy and allowing buffer in any policy.

Buffers only in a special policy Buffers in any policy
Header metadata The policy knows it cannot use object header metadata. Each policy needs to deal with a buffer's metadata specially, as it cannot use header metadata.
Tracing objects The policy knows it should not trace objects in the policy. A policy does not know if the object is a buffer or not, unless it stores some metadata to identify buffers.
Keeping buffers alive The policy provides a method. We can implement the policy to make this easy. Each policy needs to provide a method for this.
Querying objects (is_mmtk_object, is_live, is_movable, base pointer) The policy knows buffers should not be introspected A policy does not know if the object is a buffer or not, and may return incorrect wrong results, unless it store some metadata to identify buffers.
Merging objects when copying Possible Possible
qinsoon commented 1 year ago

With disjoint objects, in some places where we refer to objects, we need to differentiate 'object' (parent + buffer) from just 'parent'. For example, when we ask a binding for object sizes:

qinsoon commented 1 year ago

Some conclusions from our discussion (based on the table above):

A few other things that we discussed:

caizixian commented 1 year ago

Just to record my notes for the above discussion:

  1. identification: we want to find reachable nodes on the object graph. A node usually has one contiguous piece of memory. but may have disjoint pieces of memory (in the case of Ruby, the main object, and another buffer only referred to by the object using a raw pointer)
  2. reclaimation: cannot reclaim memory used by live objects (each such object may have disjoint pieces of memory)