mmtk / mmtk-core

Memory Management ToolKit
https://www.mmtk.io
Other
378 stars 69 forks source link

Only expose essential fields of Mutator as repr(C) #862

Closed wks closed 1 year ago

wks commented 1 year ago

TL;DR: The struct Mutator is too complicated, but VM bindings only care about one or two fields. Only expose those fields as #[repr(C)] and conceal others.

Problem

The whole struct Mutator is exposed to the VM binding as [repr(C)] because some of its fields need to be used by C code or JIT-compiled machine code to implement fast paths.

However, struct Mutator is too complicated. It has many Allocators organised as arrays, and may fields that are not consumable by C code (such as &dyn references).

Note: The Rust Reference discourages users from depending on the layout of &dyn. This section contains the following paragraph:

Note: Though you should not rely on this, all pointers to DSTs are currently twice the size of the size of usize and have the same alignment.

Bindings only need a few fields in the Mutator struct.

However, due to the complexity of the Mutator struct, C has to duplicate the structure definition in C in order to access fields deep inside the structure of Mutator. See:

And may still need to compute the offsets manually. (I wonder why the offsetof macro in the standard library doesn't work.) See:

And careful developers assert the size of the structures are the same as what it was when the structs were copied to C. See:

Proposal

Instead of making the entire Mutator struct #[repr(C)], we only make #[repr(C)] structs for important fields. For example,

#[repr(C)]
struct BumpPointerFast {
    cursor: Address,
    limit: Address,
}

Then we can transform Allocator types so that they contain such structs.

pub struct BumpAllocator<VM: VMBinding> {
    pub tls: VMThread,
    pub fast: BumpPointerFast,
    // more fields...
}

pub struct ImmixAllocator<VM: VMBinding> {
    pub tls: VMThread,
    pub fast: BumpPointerFast,
    // more fields...
}

Then we can still get those fast fields in Rust: mutator.allocators.bump_allocator[2].fast

To get pointers to those fast fields in C, VM bindings can provide wrapper functions. (Note that the fast fields are pub now.)

extern "C" mmtk_get_bump_allocator_fast(mutator: *mut Mutator, which: usize) -> *mut BumpPointerFast {
    &mut self.allocators.bump_pointer[which].fast as *mut BumpPointerFast
}

extern "C" mmtk_get_immix_allocator_fast(mutator: *mut Mutator, which: usize) -> *mut BumpPointerFast {
    &mut self.allocators.immix[which].fast as *mut BumpPointerFast
}

To compute the offset of the fast fields, the VM binding either

  1. compute (char*)fast - (char*)mutator, or
  2. use the memoffset or the field_offset crate.

Alternatives

Splitting the allocator

Split the allocator into two parts, one specifically for fast-path allocation, and the other for everything else. For ImmixAllocator, we define it as

struct ImmixAllocator {
    tls: ...,
    fast: *mut ImmixAllocatorFast,
    // other fields go here
}

#[repr(C)]
struct ImmixAllocatorFast {
    cursor: Address,
    limit: Address,
}

Let the VM binding maintain its own fast-path data structure

MMTk core doesn't need to mark any fields of Mutator as #[repr(C)]. But the VM binding lets each thread store the cursor and the limit anywhere in its thread-local storage.

thread_local uintptr_t cursor, limit;

void* vm_alloc(size_t size) {
    if (cursor + size >= limit) {
        return vm_alloc_slow_path(size, cursor, limit, ...);
    }
    uintptr_t result = cursor;
    cursor += size;
    return (void*)result;
}

But right before entering slow path, the rust-side of the binding synchronizes the cursor and the limit into the Allocator struct.

fn vm_alloc_slow_path(size: usize, cursor: Address, limit: Address, ...) -> ObjectReference {
    MUTATOR.allocators.immix[0].cursor = cursor; // Sync C fields with Rust fields
    MUTATOR.allocators.immix[0].limit = limit;
    mmtk::memory_manager::alloc(size, ...); // Actually call into mmtk-core
}
caizixian commented 1 year ago

This probably will be partially addressed by #889 that you know that the few fields you care about are at certain offsets.

k-sareen commented 1 year ago

Closed with #889