wks commented 2 weeks ago

TL;DR: Stop using Address::ZERO for erroneous cases. Use more idiomatic Rust style, such as Option<NonZeroAddress> or Result<NonZeroAddress, ErrorType>.

Unlike ObjectReference where 0 and misaligned addresses can't refer to any valid object, there is no such constraints for Address. The zero address is actually meaningful sometimes. But in practice, we often use Address::ZERO for representing special cases where a meaningful address doesn't exist. It is most often used in the code path of allocation, where a failed allocation returns Address::ZERO. It doesn't mean the object is allocated at address 0x00000000, but it is not allocated anywhere.

While this works, it is not idiomatic in Rust. https://github.com/mmtk/mmtk-core/issues/1043 described several problems with nullable ObjectReference, and many of them apply to Address, too. Notable ones include:

Since we don't know if an Address can be zero or not, we sometimes do redundant checks, and sometimes miss necessary checks.
Some functions guarantee a returned Address is never zero, but the caller can't take advantage of it. Notably, we can turn a NonZeroAddress safely to an ObjectReference and turn an Address safely to an Option<ObjectReference>, but converting Address to ObjectReference requires either an Option<ObjectReference>::unwrap() or an unsafe operation, even if we statically know the Address cannot be zero.

Similar to ObjectReference, we can define NonZeroAddress, backed by NonZeroUsize. Using #[repr(transparent)], we can make Option<NonZeroAddress> have the same layout as Address.

Using Option<NonZeroAddress> will also allow us to use Rust's ? operator which returns immediately if it is None.

fn bar() -> Option<NonZeroAddress> {...}

fn foo() -> Option<NonZeroAddress> {
    let x: NonZeroAddress = bar()?;  // Returns None if it is None.
    do_something_with(x);
    x
}

And one typical use case is allocation

fn acquire_block(&self, size: usize) -> Option<NonZeroAddress> {
    let acquired_start: NonZeroAddress = space.acquire(bytes_to_pages_up(size))?; // Return None if allocation failed.

    self.set_limit(acquired_start, acquired_start + block_size);
    self.alloc(size) // This may still return None if allocation failed for other reasons.
}

And we can use Result<NonZeroAddress, ErrorType> if we want to report more concrete errors.

When to use `Address` and when to use `Option<NonZeroAddress>`?

Use Address if the address 0x00000000 makes sense. We may sometimes use 0x00000000 as the base address of some memory region (like the entire memory) and apply a positive offset to get a non-zero address.

Use Option<NonZeroAddress> if we were previously using the address 0x00000000 as a indicator of error. For example, when allocating an object, the object address should never be zero. 0x00000000 only means allocation failed.

Performance impact

The paper https://www.steveblackburn.org/pubs/papers/rust-ismm-2016.pdf published in 2016 mentioned that

A safer alternative in Rust is to use Option<Address> initialized as None to indicate that there is no valid value. However, this adds an additional conditional and a few run-time checks to extract the actual address value in the performance-critical path of allocation, which adds around 4% performance overhead. We deem this tradeoff not to be worthwhile given the paramount importance of the allocation fast path and the infrequency with which this idiom arises within the GC implementation. Thus we choose to allow Address::zero() but mark it as unsafe so that implementers are explicitly tasked with the burden of ensuring safety.

Eight years have passed, and the Rust language now has NonZeroUsize, #[repr(transparent)] and null pointer optimization which is designed to remove the cost of Option<NonZeroUsize> over usize. We should re-evaluate the cost of using Option<NonZeroAddress> and use it where it makes sense.

Engineering

We can start from internal functions before applying it to public APIs, so we know it is internally consistent before forcing our users to make changes.

We can start from the most performance-critical parts of mmtk-core and do performance evaluation so that if it has any performance impact, it is crystal clear whether it is good or bad.

We don't have to change all uses of Address at a time. Unlike that of ObjectReference, we don't change the definition of Address and we only make an addition NonZeroAddress. We can change the code gradually, and I anticipate that we will still have quite many uses of Address throughout the code base.

qinsoon commented 2 weeks ago

I believe the benefits of this change are marginal, and the effort, potential performance impact, and added complexity do not justify the gains.

k-sareen commented 2 weeks ago

This needs to be an MEP in the first place

mmtk / mmtk-core

Introduce NonZeroAddress and write more idiomatic Rust code #1223

When to use `Address` and when to use `Option<NonZeroAddress>`?

Performance impact

Engineering

mmtk / mmtk-core

Introduce NonZeroAddress and write more idiomatic Rust code #1223

When to use Address and when to use Option<NonZeroAddress>?

Performance impact

Engineering

When to use `Address` and when to use `Option<NonZeroAddress>`?