rust-lang / rustc_codegen_gcc

libgccjit AOT codegen for rustc
Apache License 2.0
905 stars 60 forks source link

arbitrary_self_types_pointers_and_wrappers fails on aarch64 due to stack corruption with libgcc atomics #544

Open liamnaddell opened 1 month ago

liamnaddell commented 1 month ago

On aarch64, during the initialization of std, a compare_exchange_weak is called as part of std::thread::ThreadId::new, specifically line sysroot_src/library/std/src/thread/mod.rs:1190.

If I'm not mistaken, this calls out to a libgcc-implemented intrinsic, __aarch64_cas8_relax.

These intrinsics are documented here: https://github.com/llvm/llvm-project/blob/main/llvm/docs/Atomics.rst#libcalls-atomic

What appears to be happening, is that this intrinsic modifies $sp (presumably to return some argument to the caller?), however, it appears the generated rust does not expect $sp to be changed, resulting in the stored return address being set to a bogus value. When the frame is popped, we branch to some random value on the stack. This appears in GDB by observing that we branch to the "function" std::thread::ThreadId::new::COUNTER, who's "opcodes" are 0, which decodes to the udf, instruction on arm, causing a segfault (or bus error if we branch to 0x1, which appears in some other examples).

I've attached a reproducer that only depends on core to provide the intrinsic. The reproducer shows the following GDB log:

liam@gentoo ~/rustc_codegen_gcc $ gdb target/out/reproduce_core
GNU gdb (Gentoo 14.2 vanilla) 14.2
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "aarch64-unknown-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://bugs.gentoo.org/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from target/out/reproduce_core...
(gdb) b __aarch64_cas4_relax
Breakpoint 1 at 0xcf8
(gdb) run
Starting program: /home/liam/rustc_codegen_gcc/target/out/reproduce_core
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib64/libthread_db.so.1".

Breakpoint 1, 0x0000aaaaaaaa0cf8 in __aarch64_cas4_relax (param0=0, param1=1, param2=0xffffffffee64)
(gdb) disassemble
Dump of assembler code for function __aarch64_cas4_relax:
   0x0000aaaaaaaa0ce4 <+0>:     sub     sp, sp, #0x10
   0x0000aaaaaaaa0ce8 <+4>:     str     w0, [sp, #12]
   0x0000aaaaaaaa0cec <+8>:     str     w1, [sp, #8]
   0x0000aaaaaaaa0cf0 <+12>:    str     x2, [sp]
   0x0000aaaaaaaa0cf4 <+16>:    mov     w16, w0
=> 0x0000aaaaaaaa0cf8 <+20>:    ldxr    w0, [x2]
   0x0000aaaaaaaa0cfc <+24>:    cmp     w0, w16
   0x0000aaaaaaaa0d00 <+28>:    b.ne    0xaaaaaaaa0d0c <__aarch64_cas4_relax+40>  // b.any
   0x0000aaaaaaaa0d04 <+32>:    stxr    w17, w1, [x2]
   0x0000aaaaaaaa0d08 <+36>:    cbnz    w17, 0xaaaaaaaa0cf8 <__aarch64_cas4_relax+20>
   0x0000aaaaaaaa0d0c <+40>:    ret
End of assembler dump.
(gdb) bt
#0  0x0000aaaaaaaa0cf8 in __aarch64_cas4_relax (param0=0, param1=1, param2=0xffffffffee64)
#1  0x0000aaaaaaaa07cc in reproduce_core::perform_bad ()
#2  0x0000aaaaaaaa0888 in main ()
(gdb) si
0x0000aaaaaaaa0d08 in __aarch64_cas4_relax (param0=0, param1=1, param2=0xffffffffee64)
(gdb) bt
#0  0x0000aaaaaaaa0d08 in __aarch64_cas4_relax (param0=0, param1=1, param2=0xffffffffee64)
#1  0x0000aaaaaaaa07cc in reproduce_core::perform_bad ()
#2  0x0000aaaaaaaa0888 in main ()
(gdb) set disassemble-next-line on
(gdb) si
0x0000aaaaaaaa0d0c in __aarch64_cas4_relax (param0=0, param1=1, param2=0xffffffffee64)
=> 0x0000aaaaaaaa0d0c <__aarch64_cas4_relax+40>:        d65f03c0        ret
(gdb)
0x0000aaaaaaaa07cc in reproduce_core::perform_bad ()
=> 0x0000aaaaaaaa07cc <_ZN14reproduce_core11perform_bad17h8646925cf068046aE+56>:        6b14001f        cmp     w0, w20
(gdb) x/6xg $sp
0xffffffffee30: 0x0000ffffffffee64      0x0000000000000001
0xffffffffee40: 0x0000ffffffffee80      0x0000aaaaaaaa0888
0xffffffffee50: 0x0000fffffffff038      0x0000000000000001
(gdb) si
0x0000aaaaaaaa07d0 in reproduce_core::perform_bad ()
=> 0x0000aaaaaaaa07d0 <_ZN14reproduce_core11perform_bad17h8646925cf068046aE+60>:        2a0003e1        mov     w1, w0
(gdb)
0x0000aaaaaaaa07d4 in reproduce_core::perform_bad ()
=> 0x0000aaaaaaaa07d4 <_ZN14reproduce_core11perform_bad17h8646925cf068046aE+64>:        1a9f17e0        cset    w0, eq  // eq = none
(gdb)
0x0000aaaaaaaa07d8 in reproduce_core::perform_bad ()
=> 0x0000aaaaaaaa07d8 <_ZN14reproduce_core11perform_bad17h8646925cf068046aE+68>:        7100001f        cmp     w0, #0x0
(gdb)
0x0000aaaaaaaa07dc in reproduce_core::perform_bad ()
=> 0x0000aaaaaaaa07dc <_ZN14reproduce_core11perform_bad17h8646925cf068046aE+72>:        54000041        b.ne    0xaaaaaaaa07e4 <_ZN14reproduce_core11perform_bad17h8646925cf068046aE+80>  // b.any
(gdb)
0x0000aaaaaaaa07e4 in reproduce_core::perform_bad ()
=> 0x0000aaaaaaaa07e4 <_ZN14reproduce_core11perform_bad17h8646925cf068046aE+80>:        3900ffe0        strb    w0, [sp, #63]
(gdb)
0x0000aaaaaaaa07e8 in reproduce_core::perform_bad ()
=> 0x0000aaaaaaaa07e8 <_ZN14reproduce_core11perform_bad17h8646925cf068046aE+84>:        9100a3e0        add     x0, sp, #0x28
(gdb)
0x0000aaaaaaaa07ec in reproduce_core::perform_bad ()
=> 0x0000aaaaaaaa07ec <_ZN14reproduce_core11perform_bad17h8646925cf068046aE+88>:        b94033e1        ldr     w1, [sp, #48]
(gdb)
0x0000aaaaaaaa07f0 in reproduce_core::perform_bad ()
=> 0x0000aaaaaaaa07f0 <_ZN14reproduce_core11perform_bad17h8646925cf068046aE+92>:        b9000001        str     w1, [x0]
(gdb)
0x0000aaaaaaaa07f4 in reproduce_core::perform_bad ()
=> 0x0000aaaaaaaa07f4 <_ZN14reproduce_core11perform_bad17h8646925cf068046aE+96>:        9100a3e0        add     x0, sp, #0x28
(gdb)
0x0000aaaaaaaa07f8 in reproduce_core::perform_bad ()
=> 0x0000aaaaaaaa07f8 <_ZN14reproduce_core11perform_bad17h8646925cf068046aE+100>:       91001000        add     x0, x0, #0x4
(gdb)
0x0000aaaaaaaa07fc in reproduce_core::perform_bad ()
=> 0x0000aaaaaaaa07fc <_ZN14reproduce_core11perform_bad17h8646925cf068046aE+104>:       3940ffe1        ldrb    w1, [sp, #63]
(gdb)
0x0000aaaaaaaa0800 in reproduce_core::perform_bad ()
=> 0x0000aaaaaaaa0800 <_ZN14reproduce_core11perform_bad17h8646925cf068046aE+108>:       39000001        strb    w1, [x0]
(gdb)
0x0000aaaaaaaa0804 in reproduce_core::perform_bad ()
=> 0x0000aaaaaaaa0804 <_ZN14reproduce_core11perform_bad17h8646925cf068046aE+112>:       9100a3e0        add     x0, sp, #0x28
(gdb)
0x0000aaaaaaaa0808 in reproduce_core::perform_bad ()
=> 0x0000aaaaaaaa0808 <_ZN14reproduce_core11perform_bad17h8646925cf068046aE+116>:       b9400000        ldr     w0, [x0]
(gdb)
0x0000aaaaaaaa080c in reproduce_core::perform_bad ()
=> 0x0000aaaaaaaa080c <_ZN14reproduce_core11perform_bad17h8646925cf068046aE+120>:       b9003be0        str     w0, [sp, #56]
(gdb)
0x0000aaaaaaaa0810 in reproduce_core::perform_bad ()
=> 0x0000aaaaaaaa0810 <_ZN14reproduce_core11perform_bad17h8646925cf068046aE+124>:       9100a3e0        add     x0, sp, #0x28
(gdb)
0x0000aaaaaaaa0814 in reproduce_core::perform_bad ()
=> 0x0000aaaaaaaa0814 <_ZN14reproduce_core11perform_bad17h8646925cf068046aE+128>:       91001000        add     x0, x0, #0x4
(gdb)
0x0000aaaaaaaa0818 in reproduce_core::perform_bad ()
=> 0x0000aaaaaaaa0818 <_ZN14reproduce_core11perform_bad17h8646925cf068046aE+132>:       39400000        ldrb    w0, [x0]
(gdb)
0x0000aaaaaaaa081c in reproduce_core::perform_bad ()
=> 0x0000aaaaaaaa081c <_ZN14reproduce_core11perform_bad17h8646925cf068046aE+136>:       3900d3e0        strb    w0, [sp, #52]
(gdb)
0x0000aaaaaaaa0820 in reproduce_core::perform_bad ()
=> 0x0000aaaaaaaa0820 <_ZN14reproduce_core11perform_bad17h8646925cf068046aE+140>:       d2800060        mov     x0, #0x3                        // #3
(gdb)
0x0000aaaaaaaa0824 in reproduce_core::perform_bad ()
=> 0x0000aaaaaaaa0824 <_ZN14reproduce_core11perform_bad17h8646925cf068046aE+144>:       a94153f3        ldp     x19, x20, [sp, #16]
(gdb)
0x0000aaaaaaaa0828 in reproduce_core::perform_bad ()
=> 0x0000aaaaaaaa0828 <_ZN14reproduce_core11perform_bad17h8646925cf068046aE+148>:       a8c47bfd        ldp     x29, x30, [sp], #64
(gdb) x/6xg $sp
0xffffffffee30: 0x0000ffffffffee64      0x0000000000000001
0xffffffffee40: 0x0000ffffffffee80      0x0000aaaaaaaa0888
0xffffffffee50: 0x0000fffffffff038      0x0000000100000000
(gdb) si
0x0000aaaaaaaa082c in reproduce_core::perform_bad ()
=> 0x0000aaaaaaaa082c <_ZN14reproduce_core11perform_bad17h8646925cf068046aE+152>:       d65f03c0        ret
(gdb) p/x $x30
$2 = 0x1
(gdb) # uh oh
(gdb) si
0x0000000000000001 in ?? ()
=> 0x0000000000000001:
Cannot access memory at address 0x1
(gdb)

ENV INFO:

 $ cat config.toml
gcc-path = "/home/liam/rustc_gcc/gcc-install/lib"
#download-gccjit = true

liam@gentoo ~/rustc_gcc/gcc $ git remote get-url origin
https://github.com/antoyo/gcc
liam@gentoo ~/rustc_gcc/gcc $ git status
On branch master
Your branch is up to date with 'origin/master'.

Reproducer:

#![feature(
    core_intrinsics,unboxed_closures, start, lang_items, never_type, linkage,
    extern_types, thread_local
    )]
#![no_std]
#![allow(dead_code, internal_features, non_camel_case_types)]
#![feature(intrinsics)]
#![feature(rustc_attrs)]
#![no_main]

use core::panic::PanicInfo;

extern "rust-intrinsic" {
    #[rustc_nounwind]
    pub fn atomic_cxchgweak_relaxed_relaxed<T: Copy>(dst: *mut T, old: T, src: T) -> (T, bool);
}

fn perform_bad() -> usize {
    unsafe {
        let mut var = 0;
        let _result = atomic_cxchgweak_relaxed_relaxed(&mut var,0,1);
    }
    return 3;
}

#[panic_handler]
fn panic(_panic: &PanicInfo<'_>) -> ! {
    loop {}
}

#[no_mangle]
#[start]
fn main() -> usize {
    assert!(1 + 1 == 2);
    perform_bad();

    /*
   static CTR: AtomicU64 = AtomicU64::new(0);

   let mut last = CTR.load(Ordering::Relaxed);
   CTR.compare_exchange_weak(0, 1, Ordering::Relaxed, Ordering::Relaxed);
       */
    return 0;
}
antoyo commented 2 days ago

Sorry, I forgot about this.

Did you build the sysroot with --release-sysroot?

liamnaddell commented 1 day ago

I followed the exact steps on the README:

$ git clone https://github.com/antoyo/gcc
$ sudo apt install flex libmpfr-dev libgmp-dev libmpc3 libmpc-dev
$ mkdir gcc-build gcc-install
$ cd gcc-build
$ ../gcc/configure \
    --enable-host-shared \
    --enable-languages=jit \
    --enable-checking=release \ # it enables extra checks which allow to find bugs
    --disable-bootstrap \
    --disable-multilib \
    --prefix=$(pwd)/../gcc-install
$ make -j4 # You can replace `4` with another number depending on how many cores you have.
$ cat config.toml
gcc-path="/home/liam/rustc_gcc/gcc-build/gcc"
$ ./y.sh prepare # download and patch sysroot src and install hyperfine for benchmarking
$ ./y.sh build --sysroot --release
$ ./y.sh test --release
...
[AOT] arbitrary_self_types_pointers_and_wrappers
Command failed to run: Command `target/out/arbitrary_self_types_pointers_and_wrappers` failed to run: "Process received signal 11"
liamnaddell commented 1 day ago

When I try with /y.sh build --release-sysroot --release --sysroot I get the same result

antoyo commented 1 day ago

Ok, I was asking because there are known issues and compiling the sysroot in release mode is a workaround for at least some of them as you can see here.

This specific one looks very similar to the one I had in the above thread (was an atomic intrinsic, was jumping at address 0). I can't find where those intrinsics are defined right now, but if the one you have problem with is declared with #[naked], this could be the same issue.

This naked attribute isn't supported by GCC on Aarch64, but there's a PR in Rust that will change that so that it doesn't use the codegen naked attribute, so if this is indeed the issue we have here, that might solve the issue.

Perhaps it would be worth a try compiling your reproducer above in release mode (./y.sh run --release) to see if this changes anything.

I won't have time to look at this soon though, but I can help you investigate after I recover from Covid-19.

liamnaddell commented 1 day ago

Take as much time as you need, I hope you feel better soon.

As far as the reproducer, it's essentially a stripped down version of arbitrary_self_types_pointers_and_wrappers, which is the first test that depends on loading std

liamnaddell commented 1 day ago

I also don't think these intrinsics are declared with #[naked] but I could be wrong. I haven't looked at this in a while.

https://doc.rust-lang.org/1.80.1/src/core/intrinsics.rs.html

antoyo commented 1 day ago

I just tried on Asahi Linux on a Mac M1 and both the example you posted above and arbitrary_self_types_pointers_and_wrappers works well for me.

I do have the test arbitrary_self_types_pointers_and_wrappers failing if I compile the sysroot without --release-sysroot, though.

Which OS do you use? Is your CPU a M1? If not, which one is it?

bjorn3 commented 1 day ago

Accoring to the GDB version string OP uses Gentoo Linux. macOS has a slightly different ABI from the official calling convention specified by ARM (AAPCS) that is used by Linux.

liamnaddell commented 1 day ago

@bjorn3 , Antoyo's setup seems very similar to mine. I run arm64 linux, but my host os is MacOS, and I'm emulating gentoo linux on arm64 using QEMU. My CPU is an Apple M3 pro. My host GCC is 13.2.1. I compiled gcc 14.0.1 from antoyo/gcc sha fd3498bff0b939dda91d56960acc33d55f2f9cdf . I'd be very surprised if the difference between QEMU on M3 pro vs Asahi on M1 is a factor here.

liamnaddell commented 1 day ago

@antoyo , I misunderstood your original --release-sysroot comment.

./y.sh build --sysroot --release --release-sysroot ./y.sh test --release --release-sysroot

This combination results in arbitrary_self_types_pointers_and_wrappers passing.

I'd guess this is the same issue as https://github.com/rust-lang/rustc_codegen_gcc/issues/242#issuecomment-2077017285 comment points out.