mmtk / mmtk-core

Memory Management ToolKit
https://www.mmtk.io
Other
364 stars 67 forks source link

Atomic<Address> #121

Open qinsoon opened 4 years ago

qinsoon commented 4 years ago

The following code currently uses an AtomicUsize to store an address, and it needs to be accessed atomically. We should consider using Atomic<Address> from the crate (https://crates.io/crates/atomic).

https://github.com/mmtk/mmtk-core/blob/8539de7ed2bb8e89ada0e582ecdd1b6be672105b/src/policy/lockfreeimmortalspace.rs#L23

wks commented 1 year ago

There is one side effect, however. After the PR https://github.com/mmtk/mmtk-core/pull/843 is merged, the Rust compiler compiles the atomic increment into a compare-exchange loop.

I tested with the following code snippet: (update: I added #[no_mangle], otherwise rustc will specialise the function bodies for the call sites in my main function)

#[repr(transparent)]
#[derive(Clone, Copy, Debug)]
struct Address(usize);

#[no_mangle]
#[inline(never)]
fn load_atomic_address(addr: &Atomic<Address>) -> Address {
    addr.load(Ordering::SeqCst)
}

#[no_mangle]
#[inline(never)]
fn store_atomic_address(addr: &Atomic<Address>, new_val: Address) {
    addr.store(new_val, Ordering::SeqCst)
}

#[no_mangle]
#[inline(never)]
fn fetch_update_atomic_address(addr: &Atomic<Address>, increment: usize) -> Address {
    addr.fetch_update(Ordering::SeqCst, Ordering::Relaxed, |old| {
        let Address(old_addr) = old;
        Some(Address(old_addr + increment))
    })
    .unwrap()
}

#[no_mangle]
#[inline(never)]
fn fetch_update_atomic_usize(addr: &Atomic<usize>, increment: usize) -> usize {
    addr.fetch_update(Ordering::SeqCst, Ordering::Relaxed, |old| {
        Some(old + increment)
    })
    .unwrap()
}

#[no_mangle]
#[inline(never)]
fn fetch_add_atomic_usize(addr: &Atomic<usize>, increment: usize) -> usize {
    addr.fetch_add(increment, Ordering::SeqCst)
}

The generated code as seen from objdump: (rustc seems to have merged fetch_update_atomic_address and fetch_update_atomic_usize because they compile to the same instruction sequence)

00000000000085b0 <load_atomic_address>:
    85b0:       48 8b 07                mov    (%rdi),%rax
    85b3:       c3                      ret
    85b4:       66 2e 0f 1f 84 00 00    cs nopw 0x0(%rax,%rax,1)
    85bb:       00 00 00 
    85be:       66 90                   xchg   %ax,%ax

00000000000085c0 <store_atomic_address>:
    85c0:       48 87 37                xchg   %rsi,(%rdi)
    85c3:       c3                      ret
    85c4:       66 2e 0f 1f 84 00 00    cs nopw 0x0(%rax,%rax,1)
    85cb:       00 00 00 
    85ce:       66 90                   xchg   %ax,%ax

00000000000085d0 <fetch_update_atomic_address>:
    85d0:       48 8b 07                mov    (%rdi),%rax
    85d3:       66 66 66 66 2e 0f 1f    data16 data16 data16 cs nopw 0x0(%rax,%rax,1)
    85da:       84 00 00 00 00 00 
    85e0:       48 8d 0c 30             lea    (%rax,%rsi,1),%rcx
    85e4:       f0 48 0f b1 0f          lock cmpxchg %rcx,(%rdi)
    85e9:       75 f5                   jne    85e0 <fetch_update_atomic_address+0x10>
    85eb:       c3                      ret
    85ec:       0f 1f 40 00             nopl   0x0(%rax)

00000000000085f0 <fetch_add_atomic_usize>:
    85f0:       48 89 f0                mov    %rsi,%rax
    85f3:       f0 48 0f c1 07          lock xadd %rax,(%rdi)
    85f8:       c3                      ret
    85f9:       0f 1f 80 00 00 00 00    nopl   0x0(%rax)

From the disassembly, we see that the compiler is smart enough to perform atomic operations on wrapper types (Address(usize)) with #[repr(transparent)], but is not smart enough to turn a fetch_update into a fetch_add.

I think this is unlikely to become a performance bottleneck because this happens in the slow path. But if we want to perform atomic operations on Address elsewhere, we should not be obliged to pay the extra overhead. There should be a way to use machine-level atomic instructions (such as lock add) on the underlying usize directly.

One solution is providing an AtomicAddress type which wraps an AtomicUsize instead of Usize.

The other solution is sticking to AtomicUsize if we want atomic operations.

It'll be better if we can provide a specialised implementation for Atomic<Address>, but Rust doesn't allow me to impl Atomic<Address> outside the atomic crate (i.e. "orphan rule").

k-sareen commented 1 year ago

AtomicAddress sounds like a good idea. Should be easy enough to add I think.

wks commented 1 year ago

AtomicAddress sounds like a good idea. Should be easy enough to add I think.

Yes. It should be easy because we only need to implement atomic operations (load, store, swap, cmpxchg, fetch_update, fetch_add, fetch_sub, fetch_and, fetch_or, fetch_xor, fetch_min, fetch_max, ...) but not address semantics such as (to_object_reference). We should be able to delegate most of them to the underlying AtomicUsize.