Currently, MMTk defines several addresses of an object.
Name
Definition
Must be inside object
starting address
the return value of memory_manager::alloc
Yes
ObjectReference
an address that refers to an object
No
in-object address
at a constant offset from ObjectReference, used to access SFT, side metadata, etc.
Yes
header address
address used to access in-object metadata
Yes
The definition of ObjectReference is VM-specific. We currently allow ObjectReference to be outside an object because some VMs do so. For example, in JikesRVM, an ObjectReference is defined as the address to the array payload of an object if the object is an array. That saves one offset computation for array element access, but when accessing scalar object fields or object headers, the VM will have to use negative offset from the ObjectReference. When we port MMTk from JikesRVM to Rust, we inherited this type. ObjectReference is now the standard way for mmtk-core to refer to an object. We still allow ObjectReference to be outside an object so that when loading from a field in JikesRVM, we directly use the word stored in the field as ObjectReference.
However, because we only map side metadata memory for pages within spaces, addresses outside any space (or unmapped pages) may not have mapped metadata. Similar is true for SFT entries which are allocated by chunk. If we attempt to access metadata or SFT using an address outside the object, it will be a segmentation fault. To solve this problem, we require the VM binding to implement ObjectReference::ref_to_address which computes the "in-object address" of an object which must be inside the object. (https://github.com/mmtk/mmtk-core/pull/699)
Meanwhile, VMs that use conservative stack scanning needs to read a word from the stack, compute the "in-object address" from it, and see if the VO bit is set at the "in-object address". Because we don't know if a word on the stack is an actual ObjectReference or not, the offset from the ObjectReference to the "in-object address" must be a constant (i.e. can be computed without reading any data from the object body). (Also in https://github.com/mmtk/mmtk-core/pull/699)
Meanwhile, not all VMs can use "the word stored in the field" as ObjectReference. In some VMs, the thing in a field may be a compressed pointer (OpenJDK), a tagged pointer (V8), an offsetted pointer (Julia), or an indirect handle (Guile or some old version of Hotspot JVM). We solve this problem by letting the VM binding implement the Slot trait and customize the load and store method so that we always represent a word-sized pointer-based ObjectReference to mmtk-core. (https://github.com/mmtk/mmtk-core/pull/606)
Then we implemented an algorithm for finding the last VO bit from an interior pointer. If neither the ObjectReference nor the "in-object address" is required to be word-aligned, the algorithm will not be able to return an exact ObjectReference, but only an address range where one of the addresses is a valid ObjectReference. That's confusing and inefficient. Now we require that ObjectReference must be word-aligned, while the "in-object address" has no alignment requirements. This makes ObjectReference more likely not to be what's held in an object field because the VM may use the low bits as tags (V8), making the value misaligned. But this is not a problem because the VM binding can fix the alignment in Slot::load and Slot::store. (https://github.com/mmtk/mmtk-core/pull/1159)
In conclusion, an ObjectReference as required by the current mmtk-core
is an address, and
must be a constant offset from the "in-object address" because of conservative stack scanning, and
must be word-aligned to support searching for ObjectReference from interior pointer, and
mmtk-core doesn't use the raw address of ObjectReference except for debug purposes. Almost all operations are done w.r.t. the "in-object address", including trace_object, is_reachable (via SFT), marking, checking VO bit (via side metadata), checking if an object is within a chunk/block, etc.
Meanwhile, ObjectReference is not always what's in a field, either. It is something defined by the VM binding, passed around in mmtk-core, but has no useful properties except being a constant offset from an "in-object address". The only reason for a VM binding to use an address outside an object as ObjectReference is "it is what's in a field, and we don't want to waste one subtraction for every field load". But that reason may not hold, either because if we don't do the subtraction when loading, we need one subtraction at every subsequent ObjectReference::to_address().
Proposal: Require ObjectReference to be inside an object
We can add one more requirement in addition to the alignment requirement: ObjectReference must be an address inside an object.
That merges the "in-object address" and ObjectReference.
The benefits are obvious:
We directly use the raw address of ObjectReference to access SFT and side metadata since it's guaranteed to be inside an object.
If a VO bit is set for an address, it will be the exact address for the ObjectReference. There is no confusion about the offset or alignment.
Removing a few constants and methods in ObjectModel and ObjectReference. The API will be much simpler.
Removing the cost of address computing at every ObjectReference::to_address.
Concretely, we remove ObjectReference::to_address, keeping the to_raw_address, to_header and to_object_start methods. When accessing SFT or side metadata, we simply use ObjectReference::to_raw_address because it will be guaranteed to be inside the object.
We remove the constant IN_OBJECT_ADDRESS_OFFSET and the methods ObjectReference::to_address and ObjectReference::from_address. Note that IN_OBJECT_ADDRESS_OFFSET is not required to be a multiple of word size. Currently, when we set a VO bit from ObjectReference, we may be setting VO bit at an unaligned address, and we need to use the alignment requirement of ObjectReference to infer the only possible raw address of ObjectReference given a VO bit. After removing IN_OBJECT_ADDRESS_OFFSET, we set VO bit exactly at ObjectReference::to_raw_address. It will be both inside the object and aligned. There will be no need to mess with the alignment requirements. If VO bit is set at address X, then ObjectReference::from_raw_address_unchecked(X) will be guaranteed to be a valid ObjectReference.
Potential risks
Performance
By unifying ObjectReference and "in-object address", mmtk-core will no longer call ObjectReference::to_address if there is an offset between the raw address and the "in-object address". This should potentially improve the performance. However, we then requires one subtraction at every Slot::load and an addition at Slot::store. In this sense, we merely moved the overhead from to_address to load and store. We need performance evaluation to see whether the cost increases or decreases after this change. Currently the only VM binding that has different ObjectReference and "in-object address" is JikesRVM. We'll need some test results from JikesRVM.
Engineering
By unifying ObjectReference and "in-object address", mmtk-core will have an easier time mapping a VO bit to its corresponding ObjectReference. But if the VM-level reference value is a pointer outside the object, and such a value can be held on the stack, the conservative stack scanner implemented by the VM will have to compute the "candidate of ObjectReference" by subtracting the value on the stack with a value before passing the "candidate" to memory_manager::is_mmtk_object. That means, if the VM binding doesn't implement the subtraction in ObjectModel::ref_to_address, it must implement it in the conservative stack scanner. That's also shifting the complexity from one place to another. Fortunately, JikesRVM doesn't use conservative stack scanning. If V8 uses conservative stack scanning, it will always have to mask the stack word for alignment due to https://github.com/mmtk/mmtk-core/pull/1159, regardless of this change.
Status quo
Currently, MMTk defines several addresses of an object.
memory_manager::alloc
The definition of
ObjectReference
is VM-specific. We currently allowObjectReference
to be outside an object because some VMs do so. For example, in JikesRVM, anObjectReference
is defined as the address to the array payload of an object if the object is an array. That saves one offset computation for array element access, but when accessing scalar object fields or object headers, the VM will have to use negative offset from theObjectReference
. When we port MMTk from JikesRVM to Rust, we inherited this type.ObjectReference
is now the standard way for mmtk-core to refer to an object. We still allowObjectReference
to be outside an object so that when loading from a field in JikesRVM, we directly use the word stored in the field asObjectReference
.However, because we only map side metadata memory for pages within spaces, addresses outside any space (or unmapped pages) may not have mapped metadata. Similar is true for SFT entries which are allocated by chunk. If we attempt to access metadata or SFT using an address outside the object, it will be a segmentation fault. To solve this problem, we require the VM binding to implement
ObjectReference::ref_to_address
which computes the "in-object address" of an object which must be inside the object. (https://github.com/mmtk/mmtk-core/pull/699)Meanwhile, VMs that use conservative stack scanning needs to read a word from the stack, compute the "in-object address" from it, and see if the VO bit is set at the "in-object address". Because we don't know if a word on the stack is an actual
ObjectReference
or not, the offset from theObjectReference
to the "in-object address" must be a constant (i.e. can be computed without reading any data from the object body). (Also in https://github.com/mmtk/mmtk-core/pull/699)Meanwhile, not all VMs can use "the word stored in the field" as
ObjectReference
. In some VMs, the thing in a field may be a compressed pointer (OpenJDK), a tagged pointer (V8), an offsetted pointer (Julia), or an indirect handle (Guile or some old version of Hotspot JVM). We solve this problem by letting the VM binding implement theSlot
trait and customize theload
andstore
method so that we always represent a word-sized pointer-basedObjectReference
to mmtk-core. (https://github.com/mmtk/mmtk-core/pull/606)Then we implemented an algorithm for finding the last VO bit from an interior pointer. If neither the
ObjectReference
nor the "in-object address" is required to be word-aligned, the algorithm will not be able to return an exactObjectReference
, but only an address range where one of the addresses is a validObjectReference
. That's confusing and inefficient. Now we require thatObjectReference
must be word-aligned, while the "in-object address" has no alignment requirements. This makesObjectReference
more likely not to be what's held in an object field because the VM may use the low bits as tags (V8), making the value misaligned. But this is not a problem because the VM binding can fix the alignment inSlot::load
andSlot::store
. (https://github.com/mmtk/mmtk-core/pull/1159)In conclusion, an
ObjectReference
as required by the current mmtk-coreObjectReference
from interior pointer, andp.s. See https://github.com/mmtk/mmtk-core/issues/1044 for the discussion about VMs that store handles instead of object addresses in fields.
The problem
mmtk-core doesn't use the raw address of
ObjectReference
except for debug purposes. Almost all operations are done w.r.t. the "in-object address", includingtrace_object
,is_reachable
(via SFT), marking, checking VO bit (via side metadata), checking if an object is within a chunk/block, etc.Meanwhile,
ObjectReference
is not always what's in a field, either. It is something defined by the VM binding, passed around in mmtk-core, but has no useful properties except being a constant offset from an "in-object address". The only reason for a VM binding to use an address outside an object asObjectReference
is "it is what's in a field, and we don't want to waste one subtraction for every field load". But that reason may not hold, either because if we don't do the subtraction when loading, we need one subtraction at every subsequentObjectReference::to_address()
.Proposal: Require ObjectReference to be inside an object
We can add one more requirement in addition to the alignment requirement:
ObjectReference
must be an address inside an object.That merges the "in-object address" and
ObjectReference
.The benefits are obvious:
ObjectReference
to access SFT and side metadata since it's guaranteed to be inside an object.ObjectReference
. There is no confusion about the offset or alignment.ObjectModel
andObjectReference
. The API will be much simpler.ObjectReference::to_address
.Concretely, we remove
ObjectReference::to_address
, keeping theto_raw_address
,to_header
andto_object_start
methods. When accessing SFT or side metadata, we simply useObjectReference::to_raw_address
because it will be guaranteed to be inside the object.We remove the constant
IN_OBJECT_ADDRESS_OFFSET
and the methodsObjectReference::to_address
andObjectReference::from_address
. Note thatIN_OBJECT_ADDRESS_OFFSET
is not required to be a multiple of word size. Currently, when we set a VO bit fromObjectReference
, we may be setting VO bit at an unaligned address, and we need to use the alignment requirement ofObjectReference
to infer the only possible raw address ofObjectReference
given a VO bit. After removingIN_OBJECT_ADDRESS_OFFSET
, we set VO bit exactly atObjectReference::to_raw_address
. It will be both inside the object and aligned. There will be no need to mess with the alignment requirements. If VO bit is set at addressX
, thenObjectReference::from_raw_address_unchecked(X)
will be guaranteed to be a validObjectReference
.Potential risks
Performance
By unifying
ObjectReference
and "in-object address", mmtk-core will no longer callObjectReference::to_address
if there is an offset between the raw address and the "in-object address". This should potentially improve the performance. However, we then requires one subtraction at everySlot::load
and an addition atSlot::store
. In this sense, we merely moved the overhead fromto_address
toload
andstore
. We need performance evaluation to see whether the cost increases or decreases after this change. Currently the only VM binding that has differentObjectReference
and "in-object address" is JikesRVM. We'll need some test results from JikesRVM.Engineering
By unifying
ObjectReference
and "in-object address", mmtk-core will have an easier time mapping a VO bit to its correspondingObjectReference
. But if the VM-level reference value is a pointer outside the object, and such a value can be held on the stack, the conservative stack scanner implemented by the VM will have to compute the "candidate ofObjectReference
" by subtracting the value on the stack with a value before passing the "candidate" tomemory_manager::is_mmtk_object
. That means, if the VM binding doesn't implement the subtraction inObjectModel::ref_to_address
, it must implement it in the conservative stack scanner. That's also shifting the complexity from one place to another. Fortunately, JikesRVM doesn't use conservative stack scanning. If V8 uses conservative stack scanning, it will always have to mask the stack word for alignment due to https://github.com/mmtk/mmtk-core/pull/1159, regardless of this change.