Open wks opened 2 weeks ago
The intent was that an Address should originate only from mmap, malloc, or similar sources, or to be derived from existing Addresses. Any attempt to create an arbitrary Address was considered "unsafe." However, this approach hasn’t worked out very well. We can remove the "unsafe".
We discussed this topic today.
The "validity" of an Address is not well-defined in general, but there are definitions of "validity" in different contexts. For example, a block must be 32 KB in size and aligned to 32 KB, and an ObjectReference (currently defined as an address) must be within an object and must be word-aligned. In those cases, deriving addresses in one way may be safer than others. For example, we can derive the address of blocks in a chunk, given the chunk address. And we can derive chunk addresses from a memory region from mmap
.
Currently, we have several types that wrap Address
, usize
or NonZeroUsize
, including Chunk
, Block
(from both MarkSweepSpace and ImmixSpace), Line
, ObjectReference
, etc. We think it is better that whenever the code needs an address that satisfies certain definitions of validity, we use one of those wrapper types, or create one if necessary. We still use Address
if there are no applicable definitions of validity.
If I use the pagemap
crate to parse the special file /proc/self/maps
of an MMTk instance, it will return many MapsEntry
structs. We can get the range of the entry by calling entry.memory_region()
which returns a MemoryRegion which contains a start_address
and last_address
, both represented as u64
. (I think it may allow a 32-bit program (-m32
) to inspect the pages of a 64-bit process.)
In this scenario, I think it is very "valid" to convert the start_address
and the last_address
to MMTk's Address
type. After that, I can compare the resulting Address
against the memory ranges of spaces (i.e. CommonSpace::start
which is an Address
) in the MMTK
instance and see whether the mmap range lies within a space of MMTk's. Currently Address::from_usize
is labelled as unsafe
, but there is really nothing unsafe in this scenario. Constructing Address
is not unsafe (it can't crash your program), and comparing Address
values is not unsafe.
Any address can be presented as usize
or u64
. We can declare fn malloc() -> usize
, but we can also declare fn malloc() -> Address
. Clearly the latter is more favored. In your example, you can declare fn get_memory_region() -> (Address, Address)
and only do the conversion in the function, rather than doing the conversion everywhere in the code base.
I agree some 'unsafe' is unnecessary for Address
. But we haven't seen a good example yet.
On the contrary, the following is an example that unsafe is useful. https://github.com/mmtk/mmtk-core/blob/8640ab89f43f33af99e61b9ec8ff2164ed2cfab7/src/util/alloc/bumpallocator.rs#L220
TL;DR: Currently,
Address
is unsafe to create from arbitrary values. But should it be? This paper says "Address creation from arbitrary integers is forbidden", but we should rethink about that.The
Address
type has many unsafe methods. Among them, there are methods forAddress
instances:zero
,max
,from_usize
load
,store
, ...as_ref
,as_mut_ref
Understandably, memory accesses are unsafe because of potential races and unaligned/illegal memory operations, and conversions to Rust reference types should be unsafe because only the programmer can guarantee such conversions don't violate Rust's ownership and borrowing rules.
However, it's worth discussing the safety of creating
Address
instances. The following methods createAddress
instances, and some of them are marked as unsafe.ObjectReference::to_raw_address(self)
Address::from_ptr(ptr)
Address::from_mut_ptr(ptr)
Address::from_ref(r)
add
,sub
,and
,or
, ...Address::zero()
Address::max()
Address::from_usize(raw)
According to the 2016 paper Rust as a Language for High Performance GC Implementation by @qinsoon et al.,
and
The paper did not explain why "Address creation from arbitrary integers is forbidden", and it apparently contradicts with "an address represents an arbitrary location in the memory space managed by the GC". In the current Rust MMTk,
Address::zero()
,Address::max()
andAddress::from_usize
are all marked as unsafe. Their doc comments say:It is unclear what "invalid address" is. Since creating
Address
from pointers andObjectReference
is considered safe, it seems to imply thatAddress
is supposed to point to somewhere "safe", such as inside an object (But by that time the raw address ofObjectReference
was not guaranteed to be inside an object, until https://github.com/mmtk/mmtk-core/pull/1195.) or inside a memory region obtained bymmap
ormalloc
.And since address arithmetics are considered safe, we can get an
Address
safely from anyObjectReference
, and calladdr.and(0).add(0x12345678)
to "safely" create an address from an arbitrary address 0x12345678. That bypasses the unsafe annotations on the creation methods.In fact, in the current Rust MMTk code base, we use
Address
to point to quite many things that are not inside objects. Things likeChunk
,Block
andLine
are wrappers ofAddress
. We can also derive sub-regions from their parents, such as iterating through allBlock
s from aChunk
, or allLines
in aBlock
, and they are both considered safe. We also have linear scanning algorithms that go through every byte, and that is considered safe, too.So I think it is pointless to mark
zero()
,max()
andfrom_usize()
as unsafe.Address
is just what it is: an address, an arbitrary address. There is no validity guarantee of anAddress
anyway. It can be zero, be word-aligned or not, be inside the heap or not, be addressable by a 64-bit Intel CPU or not. There should be no restriction on creatingAddress
. Only memory accesses and methods that create Rust references should be unsafe. And it should be unsafe to convertAddress
toObjectReference
, too, becauseObjectReference
does have the concept of validity (which can be checked by the valid-object (VO) bit).