mmtk / mmtk-core

Memory Management ToolKit
https://www.mmtk.io
Other
379 stars 69 forks source link

Require ObjectReference to be inside an object #1170

Closed wks closed 2 months ago

wks commented 4 months ago

Status quo

Currently, MMTk defines several addresses of an object.

Name Definition Must be inside object
starting address the return value of memory_manager::alloc Yes
ObjectReference an address that refers to an object No
in-object address at a constant offset from ObjectReference, used to access SFT, side metadata, etc. Yes
header address address used to access in-object metadata Yes

The definition of ObjectReference is VM-specific. We currently allow ObjectReference to be outside an object because some VMs do so. For example, in JikesRVM, an ObjectReference is defined as the address to the array payload of an object if the object is an array. That saves one offset computation for array element access, but when accessing scalar object fields or object headers, the VM will have to use negative offset from the ObjectReference. When we port MMTk from JikesRVM to Rust, we inherited this type. ObjectReference is now the standard way for mmtk-core to refer to an object. We still allow ObjectReference to be outside an object so that when loading from a field in JikesRVM, we directly use the word stored in the field as ObjectReference.

However, because we only map side metadata memory for pages within spaces, addresses outside any space (or unmapped pages) may not have mapped metadata. Similar is true for SFT entries which are allocated by chunk. If we attempt to access metadata or SFT using an address outside the object, it will be a segmentation fault. To solve this problem, we require the VM binding to implement ObjectReference::ref_to_address which computes the "in-object address" of an object which must be inside the object. (https://github.com/mmtk/mmtk-core/pull/699)

Meanwhile, VMs that use conservative stack scanning needs to read a word from the stack, compute the "in-object address" from it, and see if the VO bit is set at the "in-object address". Because we don't know if a word on the stack is an actual ObjectReference or not, the offset from the ObjectReference to the "in-object address" must be a constant (i.e. can be computed without reading any data from the object body). (Also in https://github.com/mmtk/mmtk-core/pull/699)

Meanwhile, not all VMs can use "the word stored in the field" as ObjectReference. In some VMs, the thing in a field may be a compressed pointer (OpenJDK), a tagged pointer (V8), an offsetted pointer (Julia), or an indirect handle (Guile or some old version of Hotspot JVM). We solve this problem by letting the VM binding implement the Slot trait and customize the load and store method so that we always represent a word-sized pointer-based ObjectReference to mmtk-core. (https://github.com/mmtk/mmtk-core/pull/606)

Then we implemented an algorithm for finding the last VO bit from an interior pointer. If neither the ObjectReference nor the "in-object address" is required to be word-aligned, the algorithm will not be able to return an exact ObjectReference, but only an address range where one of the addresses is a valid ObjectReference. That's confusing and inefficient. Now we require that ObjectReference must be word-aligned, while the "in-object address" has no alignment requirements. This makes ObjectReference more likely not to be what's held in an object field because the VM may use the low bits as tags (V8), making the value misaligned. But this is not a problem because the VM binding can fix the alignment in Slot::load and Slot::store. (https://github.com/mmtk/mmtk-core/pull/1159)

In conclusion, an ObjectReference as required by the current mmtk-core

p.s. See https://github.com/mmtk/mmtk-core/issues/1044 for the discussion about VMs that store handles instead of object addresses in fields.

The problem

mmtk-core doesn't use the raw address of ObjectReference except for debug purposes. Almost all operations are done w.r.t. the "in-object address", including trace_object, is_reachable (via SFT), marking, checking VO bit (via side metadata), checking if an object is within a chunk/block, etc.

Meanwhile, ObjectReference is not always what's in a field, either. It is something defined by the VM binding, passed around in mmtk-core, but has no useful properties except being a constant offset from an "in-object address". The only reason for a VM binding to use an address outside an object as ObjectReference is "it is what's in a field, and we don't want to waste one subtraction for every field load". But that reason may not hold, either because if we don't do the subtraction when loading, we need one subtraction at every subsequent ObjectReference::to_address().

Proposal: Require ObjectReference to be inside an object

We can add one more requirement in addition to the alignment requirement: ObjectReference must be an address inside an object.

That merges the "in-object address" and ObjectReference.

The benefits are obvious:

Concretely, we remove ObjectReference::to_address, keeping the to_raw_address, to_header and to_object_start methods. When accessing SFT or side metadata, we simply use ObjectReference::to_raw_address because it will be guaranteed to be inside the object.

We remove the constant IN_OBJECT_ADDRESS_OFFSET and the methods ObjectReference::to_address and ObjectReference::from_address. Note that IN_OBJECT_ADDRESS_OFFSET is not required to be a multiple of word size. Currently, when we set a VO bit from ObjectReference, we may be setting VO bit at an unaligned address, and we need to use the alignment requirement of ObjectReference to infer the only possible raw address of ObjectReference given a VO bit. After removing IN_OBJECT_ADDRESS_OFFSET, we set VO bit exactly at ObjectReference::to_raw_address. It will be both inside the object and aligned. There will be no need to mess with the alignment requirements. If VO bit is set at address X, then ObjectReference::from_raw_address_unchecked(X) will be guaranteed to be a valid ObjectReference.

Potential risks

Performance

By unifying ObjectReference and "in-object address", mmtk-core will no longer call ObjectReference::to_address if there is an offset between the raw address and the "in-object address". This should potentially improve the performance. However, we then requires one subtraction at every Slot::load and an addition at Slot::store. In this sense, we merely moved the overhead from to_address to load and store. We need performance evaluation to see whether the cost increases or decreases after this change. Currently the only VM binding that has different ObjectReference and "in-object address" is JikesRVM. We'll need some test results from JikesRVM.

Engineering

By unifying ObjectReference and "in-object address", mmtk-core will have an easier time mapping a VO bit to its corresponding ObjectReference. But if the VM-level reference value is a pointer outside the object, and such a value can be held on the stack, the conservative stack scanner implemented by the VM will have to compute the "candidate of ObjectReference" by subtracting the value on the stack with a value before passing the "candidate" to memory_manager::is_mmtk_object. That means, if the VM binding doesn't implement the subtraction in ObjectModel::ref_to_address, it must implement it in the conservative stack scanner. That's also shifting the complexity from one place to another. Fortunately, JikesRVM doesn't use conservative stack scanning. If V8 uses conservative stack scanning, it will always have to mask the stack word for alignment due to https://github.com/mmtk/mmtk-core/pull/1159, regardless of this change.