rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
96.62k stars 12.48k forks source link

Regression: invalid memory reference #124117

Closed zeroflaw closed 4 months ago

zeroflaw commented 4 months ago

Code

I tried this code:

    let buf: &[u8]  = &[];
    let v = unsafe{ buf.as_ptr().cast::<u16>().read_unaligned() };

I expected to see this happen: perform an unsafe memory read returning a u16 (even if the value is nonsense) in both release and debug.

Instead, this happened: (signal: 11, SIGSEGV: invalid memory reference) (failure in debug only)

Version it worked on

rustc 1.79.0-nightly (1cec373f6 2024-04-16)

Version with regression

rustc 1.79.0-nightly (becebb315 2024-04-17)

the8472 commented 4 months ago

This is intended behavior. Reading from invalid pointers is undefined behavior and therefore any outcome is possible. Often it leads to silent corruption due to aggressive optimizations by the compiler, but in debug some cases can be detected and result in errors or process aborts instead.

zeroflaw commented 4 months ago

In that case should the following not fail for the same reason, but it works.

    let buf: &[u8]  = &[0];
    let v = unsafe{ buf.as_ptr().cast::<u16>().read_unaligned() };

for more context, and a use case, I have the following in a ByteCursor:

unsafe {
    let remaining = self.remaining();
    let v = self.unchecked_peak_u16(); // -> buf.as_ptr().cast::<u16>().read_unaligned
    Some(
        v & if remaining < 1 {
            return None;
        } else if remaining == 1 {
            self.unchecked_advance(1);
            0x00FF_u16
        } else {
            self.unchecked_advance(2);
            return Some(v);
        })
}
the8472 commented 4 months ago

In that case should the following not fail for the same reason, but it works.

It would be nice, but a property of undefined behavior is that there are no guarantees. UB-detection is limited-effort since it has to trade off against other concerns. But it'll fail if you run under Miri.

When you use unsafe it is on the programmer to ensure safety conditions are upheld since, unlike many other errors, because neither the compiler nor runtime checks can reliably enforce them. Writing unsafe means you promise to the compiler that the code is correct.

zeroflaw commented 4 months ago

I understand, however it is far more performant to fetch from memory and then correct. Compared to conditionally fetching only valid memory.

The SIGSEGV in debug only, means I would have to write a debug only implementation.

the8472 commented 4 months ago

I understand, however it is far more performant to fetch from memory and then correct.

By causing UB you're giving the compiler license for arbitrary miscompilations, which can result in subtle corruption. Just because it runs doesn't mean it's correct. Plus reading from the zero-address page (where a dangling pointer might point to) will result in a segfault anyway.

What you want is a load where the OOB parts are masked. On x86 this will require AVX512 instructions. If you stay within the boundary of a valid page then using assembly to perform load-then-mask may work. But touching an invalid page will still segfault.