rust-lang / unsafe-code-guidelines

Forum for discussion about what unsafe code can and can't do
https://rust-lang.github.io/unsafe-code-guidelines
Apache License 2.0
670 stars 58 forks source link

Can uninitialized memory come from the outside world? #527

Open ChayimFriedman2 opened 3 months ago

ChayimFriedman2 commented 3 months ago

For example, suppose we have the following C function:

int* get_uninit() {
    return malloc(sizeof(int));
}

Which we call from Rust:

extern "C" {
    fn get_uninit() -> *mut c_int;
}

let v = *get_uninit();

Is this code UB? We don't initialize the value, but it comes from C, not Rust.

It's pretty clear for me that this needs to be UB, since (I believe) LLVM will optimize that with LTO. But then, what about cases where LLVM will not optimize? For example, what about assembly?

get_uninit:
    mov rax, rsp

We don't initialize the value of [rsp], but LLVM has no way to know that: is it UB?

Furthermore, if it is UB, then we have to define what is considered "initialization": if we are sure we called a function that used the stack space of [rsp], does that mean it is initialized? And what if assembly code wrote into it?

After all (assuming the memory is allocated to the process, so no page faults), at the machine level there is no concept of uninitialized memory. So this brings the question, what happens when the machine and the Rust AM intersect?

Inspired by a question on Reddit.

RalfJung commented 3 months ago

So this brings the question, what happens when the machine and the Rust AM intersect?

This is really the core of your question. Or rather, it's slightly worse: this is the C AM and the Rust AM interacting. The answer it "it's complicated", and it's been discussed in a bunch of threads here, and ideally some time someone can write a summary that we can just easily point to. :)

Meanwhile, you can consult https://github.com/rust-lang/unsafe-code-guidelines/issues/421 and https://github.com/rust-lang/unsafe-code-guidelines/issues/422.

ChayimFriedman2 commented 3 months ago

@RalfJung If I understand those threads correctly, this boils down to "the operations in both abstract machines is lowered to a common abstract machine and executed there, and each step must be representable in both AMs". Which means for C it will be UB because it is uninitialized in the C AM, while in assembly it will be defined behavior since there is no uninitialized memory in the assembly "abstract machine" (i.e. the real machine). Am I correct?

RalfJung commented 3 months ago

When Rust calls a function that is, in assembly, defined as

get_uninit:
    mov rax, rsp

then that indeed can be "axiomatized" from the Rust side as a function that non-deterministically returns an arbitrary initialized integer. So yeah, that sounds right.

This reasoning only work without cross-lang LTO since with cross-lang LTO, the Abstract Machines are coming together at the LLVM IR level, and at that level, uninitialized memory is still a reality.

chorman0773 commented 3 months ago

I would also presume that if you got your hands on an electrically floating register in something used as a parameter/return register, that would count as uninitialized memory to Rust, since the compiler doesn't have to "reset" the register to a defined state, or move it into memory or another register. Similarily, if asm/C/w/e mmapped a region of memory, then madvise(MADV_FREE)d the memory, yielding a pointer to it to rust would have a Rust allocation that contains a bunch of Uninit bytes.

Although I'd be curious what happens if you return NaT from a function.