Open birktj opened 3 years ago
What's the signal/interrupt that has been called? Are you sure you didn't hit a triple fault or an interrupt handler that just infinitely loops or something along those lines?
I assume it is from me entering ^C
in the gdb prompt, but this might of course be incorrect. The relevant part of the GDB prompt:
^C
Program received signal SIGINT, Interrupt.
cortex_m_rt::HardFault_ (ef=0x2000fd30)
at /home/birk/.cargo/registry/src/github.com-1ecc6299db9ec823/cortex-m-rt-0.6.13/src/lib.rs:563
563 atomic::compiler_fence(Ordering::SeqCst);
(gdb) where
#0 cortex_m_rt::HardFault_ (ef=0x2000fd30)
at /home/birk/.cargo/registry/src/github.com-1ecc6299db9ec823/cortex-m-rt-0.6.13/src/lib.rs:563
#1 <signal handler called>
#2 0x0c004c96 in core::intrinsics::copy_nonoverlapping<u8> ()
at /rustc/cb75ad5db02783e8b0222fee363c5f63f7e2cf5b//library/core/src/intrinsics.rs:1866
Yeah, I don't think it is. You'll want to investigate why you got a hardfault interrupt, it'll tell you something along the lines of an invalid instruction, access to invalid memory or something of that sort.
The culprit seems to be this instruction:
ldrh.w r3, [r9, r3, lsl #1]
Where r3 = 0xb
and r9 = 0xc006506
and
(gdb) x/32 0xc006506
0xc006506 <.Lanon.1b6fed0caccf440b7d0323a46875ea57.225>: 0x31303030 0x33303230 0x35303430 0x37303630
0xc006516 <.Lanon.1b6fed0caccf440b7d0323a46875ea57.225+16>: 0x39303830 0x31313031 0x33313231 0x35313431
0xc006526 <.Lanon.1b6fed0caccf440b7d0323a46875ea57.225+32>: 0x37313631 0x39313831 0x31323032 0x33323232
0xc006536 <.Lanon.1b6fed0caccf440b7d0323a46875ea57.225+48>: 0x35323432 0x37323632 0x39323832 0x31333033
0xc006546 <.Lanon.1b6fed0caccf440b7d0323a46875ea57.225+64>: 0x33333233 0x35333433 0x37333633 0x39333833
0xc006556 <.Lanon.1b6fed0caccf440b7d0323a46875ea57.225+80>: 0x31343034 0x33343234 0x35343434 0x37343634
0xc006566 <.Lanon.1b6fed0caccf440b7d0323a46875ea57.225+96>: 0x39343834 0x31353035 0x33353235 0x35353435
0xc006576 <.Lanon.1b6fed0caccf440b7d0323a46875ea57.225+112>: 0x37353635 0x39353835 0x31363036 0x33363236
Why this would fail I don't know, the memory addresses are all valid
After some experimentation I have determined the following minimal reproducible example:
#[inline(never)]
fn fmt_u32() {
static DEC_DIGITS_LUT: &[u8; 6] = b"424242";
let mut buf = [0; 5];
let buf_ptr = buf.as_ptr() as *mut u8;
let lut_ptr = DEC_DIGITS_LUT.as_ptr();
// Crashes if n is odd
let n = 1;
unsafe {
core::ptr::copy_nonoverlapping(lut_ptr, buf_ptr.offset(n), 2);
}
let buf_slice = unsafe {
core::str::from_utf8_unchecked(
core::slice::from_raw_parts(buf_ptr.offset(n), 2))
};
// Force usage of `buf_slice`
write!(DummyWrite, "Test: {}", buf_slice).unwrap();
}
It seems that core::ptr::copy_nonoverlapping
ends up becoming an unaligned write (when n
is even it works fine) which causes the crash.
Does it work if you specify the -Ctarget-feature=+strict-align
rustc flag?
It seems that my MWE gets fixed however I still hit the bug in core::fmt_u32
on this instruction (from core:::ptr::copy_nonoverlapping
):
strh r1, [r2, r0]
where r0 = 0x25
and r2 = 0x2000feac
which also looks like an unaligned write.
I can now confirm that by combining -C target-feature=+strict-align
with -Z build-std=core
the issue is fixed.
cc @japaric @jonas-schievink should this target enable strict-align
by default? AFAIU use of -strict-align
on other ARM targets relies on OS-side handling of unaligned reads, which doesn't exist here?
I have never run into unaligned memory exception with Cortex-M chips. (*)
The ARMv7-M Architecture Reference Manual says:
The following data accesses support unaligned addressing, and only generate alignment faults when the CCR.UNALIGN_TRP bit is set to 1
- Non halfword-aligned LDR{S}H{T} and STRH{T}
- Non halfword-aligned TBH
- Non word-aligned LDR{T} and STR{T} Note
- Accesses to Strongly Ordered and Device memory types must always be naturally aligned
CCR looks like a register that cannot be accessed / set by software but it's set in hardware by the vendor.
It could be:
thumbv7*m-none-eabi*
targets but no longer is(*) I have seen this very same code (core::fmt
) trigger unaligned memory access exceptions on other targets though, including a Cortex-A compilation target I wrote myself (.json file) -- forgot to add +strict-align and Cortex-A defaults to exception on unaligned access.
AFAIU use of -strict-align on other ARM targets relies on OS-side handling of unaligned reads, which doesn't exist here?
On modern ARM chips, exception on unaligned access is usually configurable. Cortex-A cores have a register, accessible from software, that can change the behavior but defaults to exceptions ON.
I think old ARM chips are not configurable: unaligned access always results in exceptions. OSes like Linux handle unaligned access in interrupt context to support those chips. Those OSes also do not configure modern ARM chips to disable exceptions on unaligned access, from what I've seen.
What chip is this by the way?
It is an xmc4500 (reference manual). From the reference manual on page 128 it seems like it is possible to to set the UNALIGN_TRP
bit which says the following:
Unaligned Access Trap Enable Enables unaligned access traps: 0b do not trap unaligned halfword and word accesses 1b trap unaligned halfword and word accesses. If this bit is set to 1, an unaligned access generates a UsageFault. Unaligned LDM, STM, LDRD, and STRD instructions always fault irrespective of whether UNALIGN_TRP is set to 1.
I am going to try and see if I am able to experiment with that bit later today.
Yep, I am able to set the CCR.UNALIGN_TRP
bit to zero which also seems to fix the problem.
CCR looks like a register that cannot be accessed / set by software but it's set in hardware by the vendor.
I was wrong about this. The CCR register is part of the SCB peripheral and it's available from firmware and also exposed in the cortex-m crate. ARM Architecture Reference Manual says the reset value of this register is "implementation defined" in the latest revision of the document but "0x0" in older revisions. 0x0
means unaligned memory access does not trigger an exception. In the xmc4500 the reset value of the register is 0x200
: only the STKALIGN bit is set; that bit makes unaligned access trigger an exception.
I see two ways to deal with CCR being "implementation defined": either (a) we set +strict-align
on all Cortex-M targets, or (b) we clear the CCR register on reset in the cortex-m-rt crate. I think (a) makes the most sense because it preserves vendor's settings and not everyone may be using the cortex-m-rt crate. Adding +strict-align
will likely affect code size or performance of existing applications but I don't know what those may be (larger program? slower program? faster program?).
Updated report
After some further investigation it seems that the
core::ptr::copy_nonoverlapping
function crashes when the dest address is not even on thethumbv7em-none-eabihf
platform. In the example above I am copying two bytes and experience the crash when the address copied to is odd.MWE:
Original report
I am using the
cortex-m-semihosting
crate to write debug messages over SWD, however when writing multi-digit numbers the execution hangs. Looking at the stacktrace from gdb it seems that execution is stuck oncore::ptr::copy_nonoverlapping
. Experimenting with trivial uses ofcore::ptr::copy_nonoverlapping
does not seem to hang.Meta
The same behaviour is also present in rust nightly.
Toolchain:
thumbv7em-none-eabihf
rustc --version --verbose
:Stacktrace from gdb