`core::ptr::copy_nonoverlapping` crashes when writing to odd addresses on ARM `thumbv7em-none-eabihf`

birktj commented 3 years ago

Updated report

After some further investigation it seems that the core::ptr::copy_nonoverlapping function crashes when the dest address is not even on the thumbv7em-none-eabihf platform. In the example above I am copying two bytes and experience the crash when the address copied to is odd.

MWE:

#[inline(never)]
fn mwe() {
    let buf1 = "424242";
    let mut buf2 = [0; 6];
    let buf1_ptr = buf1.as_ptr();
    let buf2_ptr = buf2.as_mut_ptr();

    // Crashes if n is odd, not if n is even
    let n = 1;

    unsafe {
        core::ptr::copy_nonoverlapping(buf1_ptr, buf2_ptr.offset(n), 2);
    }

    // Force usage of `buf2`
    write!(DummyWrite, "Test: {:?}", &buf2).unwrap();
}

Original report

I am using the cortex-m-semihosting crate to write debug messages over SWD, however when writing multi-digit numbers the execution hangs. Looking at the stacktrace from gdb it seems that execution is stuck on core::ptr::copy_nonoverlapping. Experimenting with trivial uses of core::ptr::copy_nonoverlapping does not seem to hang.

hprintln!("Test: {}", 1); // works fine
hprintln!("Test: {}", 11); // hangs

Stacktrace from gdb

#2  <signal handler called>
#3  0x0c004c96 in core::intrinsics::copy_nonoverlapping<u8> ()
    at /rustc/cb75ad5db02783e8b0222fee363c5f63f7e2cf5b//library/core/src/intrinsics.rs:1866
#4  core::fmt::num::imp::fmt_u32 ()
    at /rustc/cb75ad5db02783e8b0222fee363c5f63f7e2cf5b//library/core/src/fmt/num.rs:263
#5  core::fmt::num::imp::{{impl}}::fmt ()
    at /rustc/cb75ad5db02783e8b0222fee363c5f63f7e2cf5b//library/core/src/fmt/num.rs:287
#6  0x0c0041c4 in core::fmt::run ()
    at /rustc/cb75ad5db02783e8b0222fee363c5f63f7e2cf5b//library/core/src/fmt/mod.rs:1121
#7  core::fmt::write ()
    at /rustc/cb75ad5db02783e8b0222fee363c5f63f7e2cf5b//library/core/src/fmt/mod.rs:1089
#8  0x0c0026aa in core::fmt::Write::write_fmt<cortex_m_semihosting::hio::HStderr> (
    self=0x20000010 <cortex_m_semihosting::export::HSTDERR+4>, args=...)
    at /home/birk/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/fmt/mod.rs:182
#9  0x0c0025e4 in cortex_m_semihosting::export::hstderr_fmt::{{closure}} ()
    at /home/birk/.cargo/registry/src/github.com-1ecc6299db9ec823/cortex-m-semihosting-0.3.7/src/export.rs:49
#10 0x0c003068 in cortex_m::interrupt::free<closure-0,core::result::Result<(), ()>> (f=...)
    at /home/birk/.cargo/registry/src/github.com-1ecc6299db9ec823/cortex-m-0.7.2/src/interrupt.rs:64
#11 0x0c00251a in cortex_m_semihosting::export::hstderr_fmt (args=...)
    at /home/birk/.cargo/registry/src/github.com-1ecc6299db9ec823/cortex-m-semihosting-0.3.7/src/export.rs:44
#12 0x0c000474 in ex6::__cortex_m_rt_main () at /home/birk/docs/cam/p232/ex6/src/main.rs:41
#13 0x0c00031a in ex6::__cortex_m_rt_main_trampoline ()
    at /home/birk/docs/cam/p232/ex6/src/main.rs:20

nagisa commented 3 years ago

What's the signal/interrupt that has been called? Are you sure you didn't hit a triple fault or an interrupt handler that just infinitely loops or something along those lines?

birktj commented 3 years ago

I assume it is from me entering ^C in the gdb prompt, but this might of course be incorrect. The relevant part of the GDB prompt:

^C
Program received signal SIGINT, Interrupt.
cortex_m_rt::HardFault_ (ef=0x2000fd30)
    at /home/birk/.cargo/registry/src/github.com-1ecc6299db9ec823/cortex-m-rt-0.6.13/src/lib.rs:563
563         atomic::compiler_fence(Ordering::SeqCst);
(gdb) where
#0  cortex_m_rt::HardFault_ (ef=0x2000fd30)
    at /home/birk/.cargo/registry/src/github.com-1ecc6299db9ec823/cortex-m-rt-0.6.13/src/lib.rs:563
#1  <signal handler called>
#2  0x0c004c96 in core::intrinsics::copy_nonoverlapping<u8> ()
    at /rustc/cb75ad5db02783e8b0222fee363c5f63f7e2cf5b//library/core/src/intrinsics.rs:1866

nagisa commented 3 years ago

Yeah, I don't think it is. You'll want to investigate why you got a hardfault interrupt, it'll tell you something along the lines of an invalid instruction, access to invalid memory or something of that sort.

birktj commented 3 years ago

The culprit seems to be this instruction:

ldrh.w  r3, [r9, r3, lsl #1]

Where r3 = 0xb and r9 = 0xc006506 and

(gdb) x/32 0xc006506
0xc006506 <.Lanon.1b6fed0caccf440b7d0323a46875ea57.225>:        0x31303030      0x33303230      0x35303430      0x37303630
0xc006516 <.Lanon.1b6fed0caccf440b7d0323a46875ea57.225+16>:     0x39303830      0x31313031      0x33313231      0x35313431
0xc006526 <.Lanon.1b6fed0caccf440b7d0323a46875ea57.225+32>:     0x37313631      0x39313831      0x31323032      0x33323232
0xc006536 <.Lanon.1b6fed0caccf440b7d0323a46875ea57.225+48>:     0x35323432      0x37323632      0x39323832      0x31333033
0xc006546 <.Lanon.1b6fed0caccf440b7d0323a46875ea57.225+64>:     0x33333233      0x35333433      0x37333633      0x39333833
0xc006556 <.Lanon.1b6fed0caccf440b7d0323a46875ea57.225+80>:     0x31343034      0x33343234      0x35343434      0x37343634
0xc006566 <.Lanon.1b6fed0caccf440b7d0323a46875ea57.225+96>:     0x39343834      0x31353035      0x33353235      0x35353435
0xc006576 <.Lanon.1b6fed0caccf440b7d0323a46875ea57.225+112>:    0x37353635      0x39353835      0x31363036      0x33363236

Why this would fail I don't know, the memory addresses are all valid

birktj commented 3 years ago

After some experimentation I have determined the following minimal reproducible example:

#[inline(never)]
fn fmt_u32() {
    static DEC_DIGITS_LUT: &[u8; 6] = b"424242";
    let mut buf = [0; 5];
    let buf_ptr = buf.as_ptr() as *mut u8;
    let lut_ptr = DEC_DIGITS_LUT.as_ptr();

    // Crashes if n is odd
    let n = 1;

    unsafe {
        core::ptr::copy_nonoverlapping(lut_ptr, buf_ptr.offset(n), 2);
    }

    let buf_slice = unsafe {
        core::str::from_utf8_unchecked(
            core::slice::from_raw_parts(buf_ptr.offset(n), 2))
    };
    // Force usage of `buf_slice`
    write!(DummyWrite, "Test: {}", buf_slice).unwrap();
}

It seems that core::ptr::copy_nonoverlapping ends up becoming an unaligned write (when n is even it works fine) which causes the crash.

nagisa commented 3 years ago

Does it work if you specify the -Ctarget-feature=+strict-align rustc flag?

birktj commented 3 years ago

It seems that my MWE gets fixed however I still hit the bug in core::fmt_u32 on this instruction (from core:::ptr::copy_nonoverlapping):

strh    r1, [r2, r0]

where r0 = 0x25 and r2 = 0x2000feac which also looks like an unaligned write.

birktj commented 3 years ago

I can now confirm that by combining -C target-feature=+strict-align with -Z build-std=core the issue is fixed.

nagisa commented 3 years ago

cc @japaric @jonas-schievink should this target enable strict-align by default? AFAIU use of -strict-align on other ARM targets relies on OS-side handling of unaligned reads, which doesn't exist here?

japaric commented 3 years ago

I have never run into unaligned memory exception with Cortex-M chips. (*)

The ARMv7-M Architecture Reference Manual says:

The following data accesses support unaligned addressing, and only generate alignment faults when the CCR.UNALIGN_TRP bit is set to 1

Non halfword-aligned LDR{S}H{T} and STRH{T}

Non halfword-aligned TBH

Non word-aligned LDR{T} and STR{T} Note

Accesses to Strongly Ordered and Device memory types must always be naturally aligned

CCR looks like a register that cannot be accessed / set by software but it's set in hardware by the vendor.

It could be:

LLVM changed recently and was doing +strict-align by default for the thumbv7*m-none-eabi* targets but no longer is
This particular chip vendor decided to configure the CCR register to make turn all unaligned memory accesses into exceptions. What chip is this by the way?

(*) I have seen this very same code (core::fmt) trigger unaligned memory access exceptions on other targets though, including a Cortex-A compilation target I wrote myself (.json file) -- forgot to add +strict-align and Cortex-A defaults to exception on unaligned access.

AFAIU use of -strict-align on other ARM targets relies on OS-side handling of unaligned reads, which doesn't exist here?

On modern ARM chips, exception on unaligned access is usually configurable. Cortex-A cores have a register, accessible from software, that can change the behavior but defaults to exceptions ON.

I think old ARM chips are not configurable: unaligned access always results in exceptions. OSes like Linux handle unaligned access in interrupt context to support those chips. Those OSes also do not configure modern ARM chips to disable exceptions on unaligned access, from what I've seen.

birktj commented 3 years ago

What chip is this by the way?

It is an xmc4500 (reference manual). From the reference manual on page 128 it seems like it is possible to to set the UNALIGN_TRP bit which says the following:

Unaligned Access Trap Enable Enables unaligned access traps: 0b do not trap unaligned halfword and word accesses 1b trap unaligned halfword and word accesses. If this bit is set to 1, an unaligned access generates a UsageFault. Unaligned LDM, STM, LDRD, and STRD instructions always fault irrespective of whether UNALIGN_TRP is set to 1.

I am going to try and see if I am able to experiment with that bit later today.

birktj commented 3 years ago

Yep, I am able to set the CCR.UNALIGN_TRP bit to zero which also seems to fix the problem.

japaric commented 3 years ago

CCR looks like a register that cannot be accessed / set by software but it's set in hardware by the vendor.

I was wrong about this. The CCR register is part of the SCB peripheral and it's available from firmware and also exposed in the cortex-m crate. ARM Architecture Reference Manual says the reset value of this register is "implementation defined" in the latest revision of the document but "0x0" in older revisions. 0x0 means unaligned memory access does not trigger an exception. In the xmc4500 the reset value of the register is 0x200: only the STKALIGN bit is set; that bit makes unaligned access trigger an exception.

I see two ways to deal with CCR being "implementation defined": either (a) we set +strict-align on all Cortex-M targets, or (b) we clear the CCR register on reset in the cortex-m-rt crate. I think (a) makes the most sense because it preserves vendor's settings and not everyone may be using the cortex-m-rt crate. Adding +strict-align will likely affect code size or performance of existing applications but I don't know what those may be (larger program? slower program? faster program?).

rust-lang / rust