TL;DR: Stop using Address::ZERO for erroneous cases. Use more idiomatic Rust style, such as Option<NonZeroAddress> or Result<NonZeroAddress, ErrorType>.
Unlike ObjectReference where 0 and misaligned addresses can't refer to any valid object, there is no such constraints for Address. The zero address is actually meaningful sometimes. But in practice, we often use Address::ZERO for representing special cases where a meaningful address doesn't exist. It is most often used in the code path of allocation, where a failed allocation returns Address::ZERO. It doesn't mean the object is allocated at address 0x00000000, but it is not allocated anywhere.
While this works, it is not idiomatic in Rust. https://github.com/mmtk/mmtk-core/issues/1043 described several problems with nullable ObjectReference, and many of them apply to Address, too. Notable ones include:
Since we don't know if an Address can be zero or not, we sometimes do redundant checks, and sometimes miss necessary checks.
Some functions guarantee a returned Address is never zero, but the caller can't take advantage of it. Notably, we can turn a NonZeroAddress safely to an ObjectReference and turn an Address safely to an Option<ObjectReference>, but converting Address to ObjectReference requires either an Option<ObjectReference>::unwrap() or an unsafe operation, even if we statically know the Address cannot be zero.
Similar to ObjectReference, we can define NonZeroAddress, backed by NonZeroUsize. Using #[repr(transparent)], we can make Option<NonZeroAddress> have the same layout as Address.
Using Option<NonZeroAddress> will also allow us to use Rust's ? operator which returns immediately if it is None.
fn bar() -> Option<NonZeroAddress> {...}
fn foo() -> Option<NonZeroAddress> {
let x: NonZeroAddress = bar()?; // Returns None if it is None.
do_something_with(x);
x
}
And one typical use case is allocation
fn acquire_block(&self, size: usize) -> Option<NonZeroAddress> {
let acquired_start: NonZeroAddress = space.acquire(bytes_to_pages_up(size))?; // Return None if allocation failed.
self.set_limit(acquired_start, acquired_start + block_size);
self.alloc(size) // This may still return None if allocation failed for other reasons.
}
And we can use Result<NonZeroAddress, ErrorType> if we want to report more concrete errors.
When to use Address and when to use Option<NonZeroAddress>?
Use Address if the address 0x00000000 makes sense. We may sometimes use 0x00000000 as the base address of some memory region (like the entire memory) and apply a positive offset to get a non-zero address.
Use Option<NonZeroAddress> if we were previously using the address 0x00000000 as a indicator of error. For example, when allocating an object, the object address should never be zero. 0x00000000 only means allocation failed.
A safer alternative in Rust is to use Option<Address> initialized as None to indicate that there is no valid value. However, this adds an additional conditional and a few run-time checks to extract the actual address value in the performance-critical path of allocation, which adds around 4% performance overhead. We deem this tradeoff not to be worthwhile given the paramount importance of the allocation fast path and the infrequency with which this idiom arises within the GC implementation. Thus we choose to allow Address::zero() but mark it as unsafe so that implementers are explicitly tasked with the burden of ensuring safety.
Eight years have passed, and the Rust language now has NonZeroUsize, #[repr(transparent)] and null pointer optimization which is designed to remove the cost of Option<NonZeroUsize> over usize. We should re-evaluate the cost of using Option<NonZeroAddress> and use it where it makes sense.
Engineering
We can start from internal functions before applying it to public APIs, so we know it is internally consistent before forcing our users to make changes.
We can start from the most performance-critical parts of mmtk-core and do performance evaluation so that if it has any performance impact, it is crystal clear whether it is good or bad.
We don't have to change all uses of Address at a time. Unlike that of ObjectReference, we don't change the definition of Address and we only make an addition NonZeroAddress. We can change the code gradually, and I anticipate that we will still have quite many uses of Address throughout the code base.
TL;DR: Stop using
Address::ZERO
for erroneous cases. Use more idiomatic Rust style, such asOption<NonZeroAddress>
orResult<NonZeroAddress, ErrorType>
.Unlike
ObjectReference
where 0 and misaligned addresses can't refer to any valid object, there is no such constraints forAddress
. The zero address is actually meaningful sometimes. But in practice, we often useAddress::ZERO
for representing special cases where a meaningful address doesn't exist. It is most often used in the code path of allocation, where a failed allocation returnsAddress::ZERO
. It doesn't mean the object is allocated at address 0x00000000, but it is not allocated anywhere.While this works, it is not idiomatic in Rust. https://github.com/mmtk/mmtk-core/issues/1043 described several problems with nullable
ObjectReference
, and many of them apply toAddress
, too. Notable ones include:Address
can be zero or not, we sometimes do redundant checks, and sometimes miss necessary checks.Address
is never zero, but the caller can't take advantage of it. Notably, we can turn aNonZeroAddress
safely to anObjectReference
and turn anAddress
safely to anOption<ObjectReference>
, but convertingAddress
toObjectReference
requires either anOption<ObjectReference>::unwrap()
or an unsafe operation, even if we statically know theAddress
cannot be zero.Similar to
ObjectReference
, we can defineNonZeroAddress
, backed byNonZeroUsize
. Using#[repr(transparent)]
, we can makeOption<NonZeroAddress>
have the same layout asAddress
.Using
Option<NonZeroAddress>
will also allow us to use Rust's?
operator which returns immediately if it isNone
.And one typical use case is allocation
And we can use
Result<NonZeroAddress, ErrorType>
if we want to report more concrete errors.When to use
Address
and when to useOption<NonZeroAddress>
?Use
Address
if the address 0x00000000 makes sense. We may sometimes use 0x00000000 as the base address of some memory region (like the entire memory) and apply a positive offset to get a non-zero address.Use
Option<NonZeroAddress>
if we were previously using the address 0x00000000 as a indicator of error. For example, when allocating an object, the object address should never be zero. 0x00000000 only means allocation failed.Performance impact
The paper https://www.steveblackburn.org/pubs/papers/rust-ismm-2016.pdf published in 2016 mentioned that
Eight years have passed, and the Rust language now has
NonZeroUsize
,#[repr(transparent)]
and null pointer optimization which is designed to remove the cost ofOption<NonZeroUsize>
overusize
. We should re-evaluate the cost of usingOption<NonZeroAddress>
and use it where it makes sense.Engineering
We can start from internal functions before applying it to public APIs, so we know it is internally consistent before forcing our users to make changes.
We can start from the most performance-critical parts of mmtk-core and do performance evaluation so that if it has any performance impact, it is crystal clear whether it is good or bad.
We don't have to change all uses of
Address
at a time. Unlike that ofObjectReference
, we don't change the definition ofAddress
and we only make an additionNonZeroAddress
. We can change the code gradually, and I anticipate that we will still have quite many uses ofAddress
throughout the code base.