Mis-optimization / undefined behavior with our `MaybeUninit` implementation

The following program is mis-opmitized by the compiler.

STR

$ cargo generate --git https://github.com/rust-embedded/cortex-m-quickstart --name app

$ cd app

$ cargo add cortex-m-rtfm@0.4.1

$ cargo add lm3s6965

$ cargo add panic-semihosting

$ cargo add lifo --git https://github.com/japaric/lifo

$ # enable debug assertions
$ tail -n5 Cargo.toml

[profile.release]
codegen-units = 1
debug = true
debug-assertions = true # <-
lto = true

$ # uncomment the QEMU runner
$ head -n2 .cargo/config

[target.thumbv7m-none-eabi]
runner = "qemu-system-arm -cpu cortex-m3 -machine lm3s6965evb -nographic -semihosting-config enable=on,target=native -kernel"

$ # write the code
$ cat src/main.rs

#![no_std]
#![no_main]

extern crate panic_semihosting;

use lifo::{pool, singleton::Box, Uninit};

pool!(A: [u8; 128]);

#[rtfm::app(device = lm3s6965)]
const APP: () = {
    #[init(spawn = [foo])]
    fn init() {
        static mut MEMORY: [u8; 1024] = [0; 1024];

        // this triggers the mis-optimization
        A::grow(MEMORY);

        spawn.foo().ok();
    }

    #[task(spawn = [bar])]
    fn foo() {
        if let Some(x) = A::alloc() {
            spawn.bar(x).ok();
        }
    }

    #[task]
    fn bar(x: Box<A, Uninit>) {
        drop(x);
    }

    extern "C" {
        fn UART0();
    }
};

$ # run the code
$ cargo run --release
panicked at 'internal error: entered unreachable code', /home/japaric/.cargo/registry/src/github.com-1ecc6299db9ec823/cortex-m-rtfm-0.4.1/src/export.rs:74:23

If debug_assertions are disabled I see undefined behavior: the program tries to write to address 0x0 and results in a hard fault.

AFAICT from debugging the program on real hardware one of the internal queues is never "Some tagged" by the software.

To elaborate, we use a "fake" implementation of core::mem::MaybeUninit that it's just a newtype over Option. our::MaybeUninit::uninitialized creates a Option::None, our::MaybeUninit::get_{ref,mut} are more or less sugar for Option.as_{ref,mut}().unwrap() -- the failure path of this unwrap results in unreachable!() if debug_assertions are enabled, and calls hint::unreachable_unchecked when debugassertions are disabled. Unlike the real MaybeUninit, our::MaybeUninit has a tag / discriminant which we must set to Some before using `get{ref,mut}. The framework sets the tag of all instances toSomebefore the userinitis called. Virtually all statics internally used by the framework useour::MaybeUninit; some of these statics are queues which cannot be const constructed on stable becauseconst fnwith trait bounds are unstable; other statics are just buffers (arrays) that must start uninitialized. What I'm observing here is that the some "set the tag toSome`" operations are being optimized away by the compiler but only in some cases.

I haven't been able to exactly trace the root of the problem (because the llvm-ir is full of debug info) but my (wild) guess is that &mut our::MaybeUninit being marked as dereferenceable is causing the "set the tag to Some" operation to be performed on a stack copy of the static mut variable rather than on the static mut variable itself. AFAICT, all the our::MaybeUninit that are not being properly "tagged" have no interior mutability (i.e. they don't contain a UnsafeCell).

Switching from our::MaybeUninit to core::mem::MaybeUninit makes the above program behave as expected. Interestingly, after the switch I see no dereferenceable attribute in the llvm-ir.

I tried changing our::MaybeUninit from being a newtype over Option to being a newtype over UnsafeCell<Option> and Option<UnsafeCell> to see if that fixed the problem but no luck -- with those changes I still saw the dereferenceable attribute.

Proposed plan of action:

Change our::MaybeUninit::get_{ref,mut} to panic if debug_assertions are disabled. A panic is better than UB.
Have the message of the above panic tell the user to switch on the "nightly" Cargo feature.
Add a "nightly" Cargo feature that switch our internal use of our::MaybeUninit to core::mem::MaybeUninit. I really didn't want to add a Cargo feature that requires nightly since just very recently it became possible to use RTFM on stable, but nightly is better than UB / panics on stable.

cc @korken89

cc rust-lang/rust#53491 @RalfJung another thing that could make use of the real MaybeUninit on stable. Also, I thought that (pointers to?) UnsafeCell were supposed to not be marked as 'dereferenceable' in llvm-ir?

I think that the plan of action is good. Interesting issue though. But the core::mem::MaybeUninit will hopefully soon stabilize and the workaround with a panic will suffice until then.

So just to make sure I understand correctly, you think the code should be correct despite not using MaybeUninit, because you are using Option correctly and making sure the hint::unreachable is not actually reachable?

Can you reproduce this on x86? That'd help me to try and reproduce it "at home" (and in Miri :D ).

Also, I thought that (pointers to?) UnsafeCell were supposed to not be marked as 'dereferenceable' in llvm-ir?

No, that was a proposal but currently, all references get dereferencable. Also I do not understand what dereferencable would have to do with this.

And anyway, UnsafeCell has no effect whatsoever below an &mut. It is only about &.

Switching from our::MaybeUninit to core::mem::MaybeUninit makes the above program behave as expected. Interestingly, after the switch I see no dereferenceable attribute in the llvm-ir.

Did you also switch from references to raw pointers then? &[mut] MaybeUninit gets dereferencable.

my (wild) guess is that &mut our::MaybeUninit being marked as dereferenceable is causing the "set the tag to Some" operation to be performed on a stack copy of the static mut variable rather than on the static mut variable itself. AFAICT, all the our::MaybeUninit that are not being properly "tagged" have no interior mutability (i.e. they don't contain a UnsafeCell).

static mut being, well, mutable doesn't cause UB. What would cause UB is creating a shared reference to it, then mutating it (potentially by another thread/function/whatever), and then using the shared reference. This has nothing to do with dereferencable and everything to do with noalias. What would also cause UB is creating a mutable reference to a static mut, then have anybody else through any other reference read or write the same memory, and then use the reference.

The dereferencable attribute just tells LLVM that it can insert extra loads from this pointer even if the pointer does not get used. It says nothing about not writing writes back.

I tried changing our::MaybeUninit from being a newtype over Option to being a newtype over UnsafeCell<Option> and Option<UnsafeCell> to see if that fixed the problem but no luck -- with those changes I still saw the dereferenceable attribute.

Did you also still see the UB? (It should be UnsafeCell<Option>, FWIW.)

What would help me is to see some of the code that you think is involved in the UB, like the code you think might get misoptimized. Also some example of how you are interacting with the static mut would help so that I can see if that looks like UB to me. In the code in the OP, I cannot even tell which is the static mut, let alone how it gets accessed.

Another example that runs into the debug assertion / UB.

#![feature(alloc)]
#![feature(alloc_error_handler)]
#![no_main]
#![no_std]

extern crate alloc;
extern crate panic_semihosting;

use alloc::{alloc::Layout, boxed::Box};

// alloc-cortex-m = "0.3.5"
use alloc_cortex_m::CortexMHeap;
use cortex_m::asm;

#[global_allocator]
static ALLOCATOR: CortexMHeap = CortexMHeap::empty();

#[alloc_error_handler]
fn oom(_: Layout) -> ! {
    panic!()
}

#[rtfm::app(device = lm3s6965)]
const APP: () = {
    #[init(spawn = [foo])]
    fn init() {
        static mut MEMORY: [u8; 1024] = [0; 1024];

        // initialize the allocator
        unsafe { ALLOCATOR.init(MEMORY.as_ptr() as usize, 1024) }

        spawn.foo().ok();
    }

    #[task(spawn = [bar])]
    fn foo() {
        spawn.bar(Box::new([0; 128])).ok();
    }

    #[task]
    fn bar(x: Box<[u8; 128]>) {
        drop(x);
    }

    extern "C" {
        fn UART0();
    }
};

It seems that destructors are required to run into this problem, which would explain why we haven't heard reports about this -- I think most people aren't using global_allocator / memory pools.

I was not able to reproduce the issue with a simple struct like struct HasDtor { inner: u32 } that implements the Drop trait though.

@RalfJung

So just to make sure I understand correctly, you think the code should be correct despite not using MaybeUninit, because you are using Option correctly and making sure the hint::unreachable is not actually reachable?

Yes. I have looked at the expanded code and it contains the code to initialize / "tag" all the Options. The code also runs fine when unoptimized. It's only when optimized when it runs into the debug assertion / UB.

Can you reproduce this on x86?

It would be hard to port the code to x86 as it uses interrupts.

Switching from our::MaybeUninit to core::mem::MaybeUninit makes the above program behave as expected Did you also switch from references to raw pointers then?

No. The only change I did was re-export core::mem::MaybeUninit as our::MaybeUninit. All the other code was not changed.

I tried changing our::MaybeUninit from being a newtype over Option to being a newtype over UnsafeCell<Option> and Option<UnsafeCell> Did you also still see the UB?

Yes, with that change the program still hits the debug assertion.

What would help me is to see some of the code that you think is involved in the UB, like the code you think might get misoptimized

This is the expanded version of the first example. (Sorry, the code is rather unreadable for $reasons.)

The reason why I think the "set the tag to Some" operation is being done on a stack copy rather than on the static mut is that the other explanation for the misoptimization would be the tag operation being optimized away but that can't be the case because I do the "tagging" using a volatile write.

It would be hard to port the code to x86 as it uses interrupts.

Wait, the issue only arises when interrupts are involved? (You didn't mention interrupts at all originally, so now I am a bit confused.)

I'm afraid I am far out of my league here, not only is there tons of code I have never seen, I also have no experience at all programming with interrupts and what the pitfalls are there. (Well I took an OS class once but that's ages ago, and "programming with interrupts" there mostly meant making sure to disable interrupts whenever there could be a problem.)

Where's the code for your MaybeUninit?

One more thing just came to my mind: are you using some static memory here to simulate a heap, meaning that calling the Rust allocator function will return pointers to static memory? That's... interesting. The allocator function is marked noalias, meaning LLVM will assume that it does not alias with anything that already exists, including all global statics. This means that, no matter how raw your pointers are, when memory has been handed out to the allocator it must under no circumstances be accessed via the "backing store" that it came from.

Maybe someone with more LLVM knowledge has a better intuition whether what we are looking at here is UB somewhere in your code, or a rustc/LLVM bug. Cc @rkruppe @arielb1

@RalfJung

Where's the code for your MaybeUninit?

https://github.com/japaric/cortex-m-rtfm/blob/v0.4.1/src/export.rs#L44

are you using some static memory here to simulate a heap, meaning that calling the Rust allocator function will return pointers to static memory?

Yes. The memory pool API looks like this:

let chunk_of_memory: &'static mut [u8; 1024] = /* .. */; // comes from a `static mut` var

let pool: Pool<[u8; 128]> = Pool::new(); // empty pool

pool.grow(chunk_of_memory); // now the pool owns the chunk of memory

let uninit_object: MyBox<[u8; 128], Uninit> = pool.alloc().unwrap(); // returns an owned pointer

let object = uninit_object.init([0; 128]); // initialize the array / object to all zeros

// do stuff with `object`

pool.free(object); // return the object to the pool

The allocator function is marked noalias

Hmm, do you mean GlobalAlloc.alloc? The memory pool API doesn't implement or use GlobalAlloc.

meaning LLVM will assume that it does not alias with anything that already exists, including all global statics

That does sound problematic. Isn't noalias also used for things like &mut arguments / values?

It's possible to initialize the global allocator using a real .heap section:

    #[init(spawn = [foo])]
    fn init() {
        // initialize the allocator
        // the .heap section is placed *after* (.bss+.data)
        let heap_start = cortex_m_rt::heap_start();
        unsafe { ALLOCATOR.init(heap_start as usize, 1024) }

        spawn.foo().ok();
    }

In that case the allocator memory never overlaps with a static variable. I still observe the panic in this case (using v0.4.1).

Hmm, do you mean GlobalAlloc.alloc? The memory pool API doesn't implement or use GlobalAlloc.

I just saw the #[global_allocator] attribute in the code above and wondered if that has anything to do with the problem here.

That does sound problematic. Isn't noalias also used for things like &mut arguments / values?

noalias on return types is somewhat different from noalias an arguments. The former is used for allocation, the latter for &[mut] arguments.

It's possible to initialize the global allocator using a real .heap section:

Uh... I have no idea what happens there. Looks like black magic to me. ;)

I'm afraid this will need someone with more LLVM knowledge who can figure out what is going on here. I don't have time right now to dig deep enough into all of this, there's a lot of stuff here that I have 0 experience with.

There has been releases for v4 and v5 now using core::mem::MaybeUninit, this should no longer be an issue.

rtic-rs / rtic

Mis-optimization / undefined behavior with our `MaybeUninit` implementation #149

STR

Proposed plan of action: