rust-lang / rustc_codegen_gcc

libgccjit AOT codegen for rustc
Apache License 2.0
905 stars 60 forks source link

AVR support #406

Open YakoYakoYokuYoku opened 7 months ago

YakoYakoYokuYoku commented 7 months ago

This is mainly motivated due to size issues when using LLVM when AVR is targeted.

I have partial support for it in my forks (gcc, gccjit.rs, rustc_codegen_gcc, but there are a few features needed and some issues to be addressed.

Features needed:

Issues to be fixed:

YakoYakoYokuYoku commented 7 months ago

When I build the sysroot in release mode, dev is not affected, I get the following crash from libgccjit...

$ LIBRARY_PATH=$(< gcc_path) LD_LIBRARY_PATH=$(< gcc_path) ./y.sh build --target $PWD/avr-atmega328p-none.json --release-sysroot          
[BUILD] build system
    Finished release [optimized] target(s) in 0.00s
    Finished dev [optimized + debuginfo] target(s) in 0.02s
[BUILD] sysroot
    Updating crates.io index
warning: Patch `rustc-std-workspace-alloc v1.99.0 (/home/yakoyoku/Codes/Rust/Clones/rustc_codegen_gcc/build_sysroot/sysroot_src/library/rustc-std-workspace-alloc)` was not used in the crate graph.
Check that the patched package version and available features are compatible
with the dependency requirements. If the patch has a different version from
what is locked in the Cargo.lock file, run `cargo update` to use the new
version. This may also occur with an optional dependency that is not enabled.
   Compiling core v0.0.0 (/home/yakoyoku/Codes/Rust/Clones/rustc_codegen_gcc/build_sysroot/sysroot_src/library/core)
during RTL pass: combine
/home/yakoyoku/Codes/C/Clones/gcc-install/usr/lib/libgccjit.so: error: in add_clobbers, at config/avr/avr-dimode.md:2705
0x7f321f52319d add_clobbers(rtx_def*, int)
        ../../gcc/gcc/config/avr/avr-dimode.md:2705
0x7f32201c650d recog_for_combine_1
        ../../gcc/gcc/combine.cc:11422
0x7f32201c818e recog_for_combine
        ../../gcc/gcc/combine.cc:11622
0x7f32201dc6b2 try_combine
        ../../gcc/gcc/combine.cc:3543
0x7f32201df7c1 combine_instructions
        ../../gcc/gcc/combine.cc:1266
0x7f32201df7c1 rest_of_handle_combine
        ../../gcc/gcc/combine.cc:14978
0x7f32201df7c1 execute
        ../../gcc/gcc/combine.cc:15023
Please submit a full bug report, with preprocessed source (by using -freport-bug).
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.
error: could not compile `core` (lib)
Command `["cargo", "build", "--release", "--target", "/home/yakoyoku/Codes/Rust/Clones/rustc_codegen_gcc/avr-atmega328p-none.json"]` failed
Command failed to run: Command `cargo build --release --target /home/yakoyoku/Codes/Rust/Clones/rustc_codegen_gcc/avr-atmega328p-none.json` (running in folder `build_sysroot`) exited with status Some(101)

The error happens at core::fmt::num::imp::fmt_u32, apart from that I have no idea on what's happening.

antoyo commented 7 months ago

Thanks for reporting this. I haven't compiled libgccjit to AVR yet, but I'll do so in the following weeks.

Do I need to use your for to reproduce this error?

YakoYakoYokuYoku commented 7 months ago

Do I need to use your for to reproduce this error?

I think so, the error appeared in an RTL combine pass and it mentions an AVR source. Also you might need the avr-atmega328p-none target definition.

{
  "arch": "avr",
  "atomic-cas": false,
  "cpu": "atmega328p",
  "data-layout": "e-P1-p:16:8-i8:8-i16:8-i32:8-i64:8-f32:8-f64:8-n8-a:8",
  "eh-frame-header": false,
  "exe-suffix": ".elf",
  "executables": true,
  "late-link-args": {
    "gcc": [
      "-lc",
      "-lgcc"
    ]
  },
  "linker": "avr-gcc",
  "linker-flavor": "gcc",
  "linker-is-gnu": true,
  "llvm-target": "avr-unknown-unknown",
  "max-atomic-width": 0,
  "no-default-libraries": false,
  "pre-link-args": {
    "gcc": [
      "-Os",
      "-mmcu=atmega328p",
      "-Wl,--as-needed"
    ]
  },
  "relocation-model": "static",
  "target-c-int-width": "16",
  "target-endian": "little",
  "target-pointer-width": "16"
}

And then compile with:

$ LIBRARY_PATH=$(< gcc_path) LD_LIBRARY_PATH=$(< gcc_path) ./y.sh build --target-triple avr --target $PWD/avr-atmega328p-none.json --release-sysroot

If we cannot find anything then we can create a GCC Bugzilla ticket.

antoyo commented 7 months ago

Thanks.

In the mean time, if you could find a small reproducer (small #[no_std] Rust code) for the bug, that could help.

YakoYakoYokuYoku commented 7 months ago

Here's a reproducer with an adapted core::fmt::num::imp::fmt_u32...

#![no_std]
#![no_main]

#![feature(maybe_uninit_slice)]

use core::{
    fmt,
    hint,
    mem::MaybeUninit,
    panic,
    ptr,
    slice,
    str,
};

static DEC_DIGITS_LUT: &[u8; 200] = b"0001020304050607080910111213141516171819\
      2021222324252627282930313233343536373839\
      4041424344454647484950515253545556575859\
      6061626364656667686970717273747576777879\
      8081828384858687888990919293949596979899";

pub fn fmt_u32(mut n: u32, is_nonnegative: bool, f: &mut fmt::Formatter<'_>) -> fmt::Result {
    // 2^128 is about 3*10^38, so 39 gives an extra byte of space
    let mut buf = [MaybeUninit::<u8>::uninit(); 39];
    let mut curr = buf.len();
    let buf_ptr = MaybeUninit::slice_as_mut_ptr(&mut buf);
    let lut_ptr = DEC_DIGITS_LUT.as_ptr();

    // SAFETY: Since `d1` and `d2` are always less than or equal to `198`, we
    // can copy from `lut_ptr[d1..d1 + 1]` and `lut_ptr[d2..d2 + 1]`. To show
    // that it's OK to copy into `buf_ptr`, notice that at the beginning
    // `curr == buf.len() == 39 > log(n)` since `n < 2^128 < 10^39`, and at
    // each step this is kept the same as `n` is divided. Since `n` is always
    // non-negative, this means that `curr > 0` so `buf_ptr[curr..curr + 1]`
    // is safe to access.
    unsafe {
        // need at least 16 bits for the 4-characters-at-a-time to work.
        assert!(core::mem::size_of::<u32>() >= 2);

        // eagerly decode 4 characters at a time
        while n >= 10000 {
            let rem = (n % 10000) as usize;
            n /= 10000;

            let d1 = (rem / 100) << 1;
            let d2 = (rem % 100) << 1;
            curr -= 4;

            // We are allowed to copy to `buf_ptr[curr..curr + 3]` here since
            // otherwise `curr < 0`. But then `n` was originally at least `10000^10`
            // which is `10^40 > 2^128 > n`.
            ptr::copy_nonoverlapping(lut_ptr.add(d1), buf_ptr.add(curr), 2);
            ptr::copy_nonoverlapping(lut_ptr.add(d2), buf_ptr.add(curr + 2), 2);
        }

        // if we reach here numbers are <= 9999, so at most 4 chars long
        let mut n = n as usize; // possibly reduce 64bit math

        // decode 2 more chars, if > 2 chars
        if n >= 100 {
            let d1 = (n % 100) << 1;
            n /= 100;
            curr -= 2;
            ptr::copy_nonoverlapping(lut_ptr.add(d1), buf_ptr.add(curr), 2);
        }

        // decode last 1 or 2 chars
        if n < 10 {
            curr -= 1;
            *buf_ptr.add(curr) = (n as u8) + b'0';
        } else {
            let d1 = n << 1;
            curr -= 2;
            ptr::copy_nonoverlapping(lut_ptr.add(d1), buf_ptr.add(curr), 2);
        }
    }

    // SAFETY: `curr` > 0 (since we made `buf` large enough), and all the chars are valid
    // UTF-8 since `DEC_DIGITS_LUT` is
    let buf_slice = unsafe {
        str::from_utf8_unchecked(
            slice::from_raw_parts(buf_ptr.add(curr), buf.len() - curr))
    };
    f.pad_integral(is_nonnegative, "", buf_slice)
}

struct Qux(pub u32);

impl fmt::Debug for Qux {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        fmt_u32(self.0, true, f)
    }
}

#[export_name = "main"]
fn main() {
    panic!("{:?}", Qux(0));
}

#[panic_handler]
pub fn panic(_info: &panic::PanicInfo) -> ! {
    unsafe { hint::unreachable_unchecked() }
}
antoyo commented 7 months ago

Thanks! I'll look at that in the coming weeks.

YakoYakoYokuYoku commented 7 months ago

I've found the source of the errors, it happens at those ptr::copy_nonoverlapping calls, perhaps the copy_nonoverlapping is the culprit here.

YakoYakoYokuYoku commented 7 months ago

Ok, from what I've found the bug appears when copy_nonoverlapping is used like in core::fmt::num::imp::fmt_u32 while overflow-checks is disabled. If we look at the rustc sources we see that copy_nonoverlapping gets transformed into memcpy which in our case is implemented with Builder::memcpy.

https://github.com/rust-lang/rustc_codegen_gcc/blob/70586a23a7fbe5b78752438587db86678f53ee2f/src/builder.rs#L1079-L1088

We can see two things here, one is that the destination pointer gets casted into *i8 but I don't believe that this is a problem, though, the other is that aligns are not handled. Considering how copy_nonoverlapping is implemented in core I theorize two things.

https://github.com/rust-lang/rust/blob/6b6f2a5a28ddd9031f6ac2104c7d2853e65c480e/library/core/src/intrinsics.rs#L2696-L2716

Either this ICE is produced by the lack of align handling, otherwise done at compile and run time whe overflow-checks is enabled per the snippet, which might emit code that can confuse GCC, or that precondition assert adds something else in the mix. Whatever it is we have to deal with it. For now I'll ask to you @antoyo if you could take a look to implement align handling, because I don't know how yet, so then I could at least try to see if it'll fix the ICE, or at least, give me a tip for it.

antoyo commented 7 months ago

You could try casting the pointers like it is done for load().

Otherwise, I might be able to take a look in a couple of weeks.

antoyo commented 7 months ago

Also, maybe I could be able to help sooner if you post the lines around 2705 in the file gcc/gcc/config/avr/avr-dimode.md. The file I have only contains 692 lines: maybe this is the line number in the generated file, which could be in your build directory.

YakoYakoYokuYoku commented 7 months ago

The file in my clone has that length too, I haven't modified it, what happens is that it gets processed by an utility, gcc/genemit.cc, but it seems that the resulting file has the same debug name as gcc/config/avr/avr-dimode.md, the file in reality is called gcc/insn-emit.cc, which lives at the build directory. The add_clobbers function takes an instruction number and along a pattern it generates a clobber based on a switch case if the number was known, otherwise it goes to gcc_unreachable, only that and nothing else. Not so relevant to post here.

When a function that uses copy_nonoverlapping like fmt_u32 is codegened with overflow-checks disabled it passes to add_clobbers an instruction number of 403, which is unknown, therefore ending in an ICE. This means that this is produced from a malformation in the emitted code. I've studied the source further and it was that only the loop copy_nonoverlapping calls were producing the issue. By commenting some parts of the code I've noticed that the combination of using the result of a modulo plus a type cast (rem) with the nonoverlapping copy all inside that loop yielded the error.

YakoYakoYokuYoku commented 7 months ago

I've noticed that the type gets downcasted from u32 down to u16 (usize for AVR). But I think that I know what might be causing this issue. If I change fmt_u32 to be as follows.

pub fn fmt_u32(mut n: u32, is_nonnegative: bool, f: &mut fmt::Formatter<'_>) -> fmt::Result {
    // 2^128 is about 3*10^38, so 39 gives an extra byte of space
    let mut buf = [MaybeUninit::<u8>::uninit(); 39];
    let mut curr = buf.len();
    let buf_ptr = MaybeUninit::slice_as_mut_ptr(&mut buf);
    let lut_ptr = DEC_DIGITS_LUT.as_ptr();

    unsafe {
        while n >= 100 {
            let d1 = (n % 7) as usize;
            n /= 7;
            ptr::copy_nonoverlapping(lut_ptr.add(d1), buf_ptr.add(curr), 2);
        }
    }

    let buf_slice = unsafe {
        str::from_utf8_unchecked(
            slice::from_raw_parts(buf_ptr.add(curr), buf.len() - curr))
    };
    f.pad_integral(is_nonnegative, "", buf_slice)
}

The issue pops up with the loop.

while n >= 100 {
    let d1 = (n % 7) as usize;
    n /= 7;
    ptr::copy_nonoverlapping(lut_ptr.add(d1), buf_ptr.add(curr), 2);
}

Observe that the number that is used to divide n is the same in both versions. This produces an optimization in the compiler to do a div that returns both quotient and remainder. Because the remainder, d1 in this case, was casted into a type that is smaller in size it could've resulted into a corruption during passes in the output code. Overflow checks prevented this so that's why a --release-sysroot with overflow-checks enabled succeded.

In conclusion, this is likely to be a bug with GCC so we might have to workaround it.

YakoYakoYokuYoku commented 6 months ago

Found a fix to the issue, it was an upstream bug with how udivmod instructions are processed in AVR, to fix it here you'll have to update your GCC fork or at least cherry-pick 80348e6.

antoyo commented 6 months ago

I'm in the process of rebasing my GCC fork. Hopefully, this should finish soon.

antoyo commented 6 months ago

The rebase is done. Can you please test to see if the issue is fixed?

YakoYakoYokuYoku commented 5 months ago

It worked, thanks for the rebase.