Open YakoYakoYokuYoku opened 11 months ago
When I build the sysroot in release mode, dev is not affected, I get the following crash from libgccjit...
$ LIBRARY_PATH=$(< gcc_path) LD_LIBRARY_PATH=$(< gcc_path) ./y.sh build --target $PWD/avr-atmega328p-none.json --release-sysroot
[BUILD] build system
Finished release [optimized] target(s) in 0.00s
Finished dev [optimized + debuginfo] target(s) in 0.02s
[BUILD] sysroot
Updating crates.io index
warning: Patch `rustc-std-workspace-alloc v1.99.0 (/home/yakoyoku/Codes/Rust/Clones/rustc_codegen_gcc/build_sysroot/sysroot_src/library/rustc-std-workspace-alloc)` was not used in the crate graph.
Check that the patched package version and available features are compatible
with the dependency requirements. If the patch has a different version from
what is locked in the Cargo.lock file, run `cargo update` to use the new
version. This may also occur with an optional dependency that is not enabled.
Compiling core v0.0.0 (/home/yakoyoku/Codes/Rust/Clones/rustc_codegen_gcc/build_sysroot/sysroot_src/library/core)
during RTL pass: combine
/home/yakoyoku/Codes/C/Clones/gcc-install/usr/lib/libgccjit.so: error: in add_clobbers, at config/avr/avr-dimode.md:2705
0x7f321f52319d add_clobbers(rtx_def*, int)
../../gcc/gcc/config/avr/avr-dimode.md:2705
0x7f32201c650d recog_for_combine_1
../../gcc/gcc/combine.cc:11422
0x7f32201c818e recog_for_combine
../../gcc/gcc/combine.cc:11622
0x7f32201dc6b2 try_combine
../../gcc/gcc/combine.cc:3543
0x7f32201df7c1 combine_instructions
../../gcc/gcc/combine.cc:1266
0x7f32201df7c1 rest_of_handle_combine
../../gcc/gcc/combine.cc:14978
0x7f32201df7c1 execute
../../gcc/gcc/combine.cc:15023
Please submit a full bug report, with preprocessed source (by using -freport-bug).
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.
error: could not compile `core` (lib)
Command `["cargo", "build", "--release", "--target", "/home/yakoyoku/Codes/Rust/Clones/rustc_codegen_gcc/avr-atmega328p-none.json"]` failed
Command failed to run: Command `cargo build --release --target /home/yakoyoku/Codes/Rust/Clones/rustc_codegen_gcc/avr-atmega328p-none.json` (running in folder `build_sysroot`) exited with status Some(101)
The error happens at core::fmt::num::imp::fmt_u32
, apart from that I have no idea on what's happening.
Thanks for reporting this. I haven't compiled libgccjit to AVR yet, but I'll do so in the following weeks.
Do I need to use your for to reproduce this error?
Do I need to use your for to reproduce this error?
I think so, the error appeared in an RTL combine pass and it mentions an AVR source. Also you might need the avr-atmega328p-none
target definition.
{
"arch": "avr",
"atomic-cas": false,
"cpu": "atmega328p",
"data-layout": "e-P1-p:16:8-i8:8-i16:8-i32:8-i64:8-f32:8-f64:8-n8-a:8",
"eh-frame-header": false,
"exe-suffix": ".elf",
"executables": true,
"late-link-args": {
"gcc": [
"-lc",
"-lgcc"
]
},
"linker": "avr-gcc",
"linker-flavor": "gcc",
"linker-is-gnu": true,
"llvm-target": "avr-unknown-unknown",
"max-atomic-width": 0,
"no-default-libraries": false,
"pre-link-args": {
"gcc": [
"-Os",
"-mmcu=atmega328p",
"-Wl,--as-needed"
]
},
"relocation-model": "static",
"target-c-int-width": "16",
"target-endian": "little",
"target-pointer-width": "16"
}
And then compile with:
$ LIBRARY_PATH=$(< gcc_path) LD_LIBRARY_PATH=$(< gcc_path) ./y.sh build --target-triple avr --target $PWD/avr-atmega328p-none.json --release-sysroot
If we cannot find anything then we can create a GCC Bugzilla ticket.
Thanks.
In the mean time, if you could find a small reproducer (small #[no_std]
Rust code) for the bug, that could help.
Here's a reproducer with an adapted core::fmt::num::imp::fmt_u32
...
#![no_std]
#![no_main]
#![feature(maybe_uninit_slice)]
use core::{
fmt,
hint,
mem::MaybeUninit,
panic,
ptr,
slice,
str,
};
static DEC_DIGITS_LUT: &[u8; 200] = b"0001020304050607080910111213141516171819\
2021222324252627282930313233343536373839\
4041424344454647484950515253545556575859\
6061626364656667686970717273747576777879\
8081828384858687888990919293949596979899";
pub fn fmt_u32(mut n: u32, is_nonnegative: bool, f: &mut fmt::Formatter<'_>) -> fmt::Result {
// 2^128 is about 3*10^38, so 39 gives an extra byte of space
let mut buf = [MaybeUninit::<u8>::uninit(); 39];
let mut curr = buf.len();
let buf_ptr = MaybeUninit::slice_as_mut_ptr(&mut buf);
let lut_ptr = DEC_DIGITS_LUT.as_ptr();
// SAFETY: Since `d1` and `d2` are always less than or equal to `198`, we
// can copy from `lut_ptr[d1..d1 + 1]` and `lut_ptr[d2..d2 + 1]`. To show
// that it's OK to copy into `buf_ptr`, notice that at the beginning
// `curr == buf.len() == 39 > log(n)` since `n < 2^128 < 10^39`, and at
// each step this is kept the same as `n` is divided. Since `n` is always
// non-negative, this means that `curr > 0` so `buf_ptr[curr..curr + 1]`
// is safe to access.
unsafe {
// need at least 16 bits for the 4-characters-at-a-time to work.
assert!(core::mem::size_of::<u32>() >= 2);
// eagerly decode 4 characters at a time
while n >= 10000 {
let rem = (n % 10000) as usize;
n /= 10000;
let d1 = (rem / 100) << 1;
let d2 = (rem % 100) << 1;
curr -= 4;
// We are allowed to copy to `buf_ptr[curr..curr + 3]` here since
// otherwise `curr < 0`. But then `n` was originally at least `10000^10`
// which is `10^40 > 2^128 > n`.
ptr::copy_nonoverlapping(lut_ptr.add(d1), buf_ptr.add(curr), 2);
ptr::copy_nonoverlapping(lut_ptr.add(d2), buf_ptr.add(curr + 2), 2);
}
// if we reach here numbers are <= 9999, so at most 4 chars long
let mut n = n as usize; // possibly reduce 64bit math
// decode 2 more chars, if > 2 chars
if n >= 100 {
let d1 = (n % 100) << 1;
n /= 100;
curr -= 2;
ptr::copy_nonoverlapping(lut_ptr.add(d1), buf_ptr.add(curr), 2);
}
// decode last 1 or 2 chars
if n < 10 {
curr -= 1;
*buf_ptr.add(curr) = (n as u8) + b'0';
} else {
let d1 = n << 1;
curr -= 2;
ptr::copy_nonoverlapping(lut_ptr.add(d1), buf_ptr.add(curr), 2);
}
}
// SAFETY: `curr` > 0 (since we made `buf` large enough), and all the chars are valid
// UTF-8 since `DEC_DIGITS_LUT` is
let buf_slice = unsafe {
str::from_utf8_unchecked(
slice::from_raw_parts(buf_ptr.add(curr), buf.len() - curr))
};
f.pad_integral(is_nonnegative, "", buf_slice)
}
struct Qux(pub u32);
impl fmt::Debug for Qux {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
fmt_u32(self.0, true, f)
}
}
#[export_name = "main"]
fn main() {
panic!("{:?}", Qux(0));
}
#[panic_handler]
pub fn panic(_info: &panic::PanicInfo) -> ! {
unsafe { hint::unreachable_unchecked() }
}
Thanks! I'll look at that in the coming weeks.
I've found the source of the errors, it happens at those ptr::copy_nonoverlapping
calls, perhaps the copy_nonoverlapping
is the culprit here.
Ok, from what I've found the bug appears when copy_nonoverlapping
is used like in core::fmt::num::imp::fmt_u32
while overflow-checks
is disabled. If we look at the rustc
sources we see that copy_nonoverlapping
gets transformed into memcpy
which in our case is implemented with Builder::memcpy
.
We can see two things here, one is that the destination pointer gets casted into *i8
but I don't believe that this is a problem, though, the other is that aligns are not handled. Considering how copy_nonoverlapping
is implemented in core
I theorize two things.
Either this ICE is produced by the lack of align handling, otherwise done at compile and run time whe overflow-checks
is enabled per the snippet, which might emit code that can confuse GCC, or that precondition assert adds something else in the mix. Whatever it is we have to deal with it. For now I'll ask to you @antoyo if you could take a look to implement align handling, because I don't know how yet, so then I could at least try to see if it'll fix the ICE, or at least, give me a tip for it.
You could try casting the pointers like it is done for load()
.
Otherwise, I might be able to take a look in a couple of weeks.
Also, maybe I could be able to help sooner if you post the lines around 2705 in the file gcc/gcc/config/avr/avr-dimode.md
. The file I have only contains 692 lines: maybe this is the line number in the generated file, which could be in your build directory.
The file in my clone has that length too, I haven't modified it, what happens is that it gets processed by an utility, gcc/genemit.cc
, but it seems that the resulting file has the same debug name as gcc/config/avr/avr-dimode.md
, the file in reality is called gcc/insn-emit.cc
, which lives at the build directory. The add_clobbers
function takes an instruction number and along a pattern it generates a clobber based on a switch case if the number was known, otherwise it goes to gcc_unreachable
, only that and nothing else. Not so relevant to post here.
When a function that uses copy_nonoverlapping
like fmt_u32
is codegened with overflow-checks
disabled it passes to add_clobbers
an instruction number of 403
, which is unknown, therefore ending in an ICE. This means that this is produced from a malformation in the emitted code. I've studied the source further and it was that only the loop copy_nonoverlapping
calls were producing the issue. By commenting some parts of the code I've noticed that the combination of using the result of a modulo plus a type cast (rem
) with the nonoverlapping copy all inside that loop yielded the error.
I've noticed that the type gets downcasted from u32
down to u16
(usize
for AVR). But I think that I know what might be causing this issue. If I change fmt_u32
to be as follows.
pub fn fmt_u32(mut n: u32, is_nonnegative: bool, f: &mut fmt::Formatter<'_>) -> fmt::Result {
// 2^128 is about 3*10^38, so 39 gives an extra byte of space
let mut buf = [MaybeUninit::<u8>::uninit(); 39];
let mut curr = buf.len();
let buf_ptr = MaybeUninit::slice_as_mut_ptr(&mut buf);
let lut_ptr = DEC_DIGITS_LUT.as_ptr();
unsafe {
while n >= 100 {
let d1 = (n % 7) as usize;
n /= 7;
ptr::copy_nonoverlapping(lut_ptr.add(d1), buf_ptr.add(curr), 2);
}
}
let buf_slice = unsafe {
str::from_utf8_unchecked(
slice::from_raw_parts(buf_ptr.add(curr), buf.len() - curr))
};
f.pad_integral(is_nonnegative, "", buf_slice)
}
The issue pops up with the loop.
while n >= 100 {
let d1 = (n % 7) as usize;
n /= 7;
ptr::copy_nonoverlapping(lut_ptr.add(d1), buf_ptr.add(curr), 2);
}
Observe that the number that is used to divide n
is the same in both versions. This produces an optimization in the compiler to do a div
that returns both quotient and remainder. Because the remainder, d1
in this case, was casted into a type that is smaller in size it could've resulted into a corruption during passes in the output code. Overflow checks prevented this so that's why a --release-sysroot
with overflow-checks
enabled succeded.
In conclusion, this is likely to be a bug with GCC so we might have to workaround it.
Found a fix to the issue, it was an upstream bug with how udivmod
instructions are processed in AVR, to fix it here you'll have to update your GCC fork or at least cherry-pick 80348e6
.
I'm in the process of rebasing my GCC fork. Hopefully, this should finish soon.
The rebase is done. Can you please test to see if the issue is fixed?
It worked, thanks for the rebase.
Volatile loads and stores was implemented in #572 and https://github.com/rust-lang/gcc/commit/50d1270fd6409407f38b982e606df1dba4bf58ed, please tell me if that works for you.
This is mainly motivated due to size issues when using LLVM when AVR is targeted.
I have partial support for it in my forks (gcc, gccjit.rs, rustc_codegen_gcc, but there are a few features needed and some issues to be addressed.
Features needed:
-m
) in libgccjit.sysroot
should only depend oncore
andalloc
for AVR.Issues to be fixed:
atmega328p
, fails with an error message.$GCC_EXEC_PATH
needs to be set for it to pickdevice-specs
at the correct location (lib/gcc/avr/<gcc version>
).avr-objcopy -O ihex
are way bigger than C.