ralfbiedert / cheats.rs

Rust Language Cheat Sheet - https://cheats.rs
https://cheats.rs
4.14k stars 395 forks source link

`MaybeUninit` has an "uninit" variant, not an "undefined" one #198

Closed WaffleLapkin closed 3 months ago

WaffleLapkin commented 3 months ago

MaybeUninit does not have an " Undefine̻̅ḓ̓ " variant (as a side note, this might be terrible for screen readers / accessibility anyway). It has an "uninit" variant.

I would say that is an important distinction, "undefined" seems mysterious and possibly not reproducible (i.e. "undefined" value may have different value each time it is observed), but actually "uninit" is just a single value that exists in the Rust Abstract Machine :tm: (if I'm not mistaken a "byte" is defined as either uninit or a 0..=255 integer value).

So while you don't know how an uninit variant is represented on the hardware, this does not actually matter, since you can't introspect it from within the abstract machine.

uninit is just a value, which is invalid for most operations.

So I think explaining it as "undefined" and "cursed" is not the best way.

(P.S. haven't done much with unsafe lately, so I'm not sure if terminology is correct here, but I believe the idea is)

ralfbiedert commented 3 months ago

Maybe I misunderstand your point, but the uninit variant is certainly "cursed", as merely accessing it unpredictably does things that the related T (e.g., u8) would not, including crashing your application:

use std::mem::MaybeUninit;

fn main() {
    let x = MaybeUninit::<u8>::uninit();

    unsafe {
        if x.assume_init() == 0 {
            dbg!("x is zero");
        }
    }
}

gives on my machine with Rust 1.81-nightly

C:/Users/rb/.cargo/bin/cargo.exe run --color=always --package endianess --bin endianess --release
    Finished `release` profile [optimized] target(s) in 0.01s
     Running `target\release\endianess.exe`
error: process didn't exit successfully: `target\release\endianess.exe` (exit code: 0xc000001d, STATUS_ILLEGAL_INSTRUCTION)

Process finished with exit code -1073741795 (0xC000001D)

In C lingo, the result of accessing a MaybeUninit that is in the uninit state is undefined, as from that moment on, no sensible prediction about the entire application can be made anymore.

ralfbiedert commented 3 months ago

Closing due to inactivity.

WaffleLapkin commented 3 months ago

Maybe I misunderstand your point, but the uninit variant is certainly "cursed", as merely accessing it unpredictably does things that the related T (e.g., u8) would not, including crashing your application:

Calling assume_init on a MaybeUninit which is uninit is undefined behavior, that undefined behavior is the reason of the "cursed" behavior, not the variant itself.

Your example is no different to this:

fn main() {
    let x = 0;

    unsafe {
        if std::num::NonZeroU8::new_unchecked(0) == 1 {
            dbg!("x is one");
        }
    }
}

Here NonZeroU8::new_unchecked's safety contract doesn't allow to call it with a 0. Just like MaybeUninit::assume_init can't be called with MaybeUninit::uninit().

assume_init is not merely an "access".

ralfbiedert commented 3 months ago

Let me start off by saying I think there is merit to the argument that the union variant MaybeUninit::uninit is technically valid to have, while NonZeroU8(0) is not.

However, assume_init is just "a random documented instance" of how reading uninit will cause UB, and, unless I'm mistaken, there is nothing special or magic about that method (in contrast to the type). There is no magic #[lang] or similar on it , and UB simply happens due to the self.value access:

    pub const unsafe fn assume_init(self) -> T {
        unsafe {
            intrinsics::assert_inhabited::<T>();
            ManuallyDrop::into_inner(self.value)
        }
    }

Likewise, reading the uninitialized value by other means also gives UB:

use std::mem::{transmute, MaybeUninit};

fn main() {
    let x: u8 = {
        let m = MaybeUninit::<u8>::uninit();
        unsafe { transmute(m) }
    };

    if x == 0 {
        dbg!(x); // Crash
    }
}

Now from the perspective of this cheat sheet and a condensed info graphic, I still think the current depiction is better than the alternative of showing it as merely any other variant:

There is really only one way to get that variant MaybeUninit::uninit(), and once you set another value you can't really get uninit back; so it's not a variant you really work with. Conversely, doing anything meaningful with that variant (except getting rid of it) will have severe side effects, which is exactly what that depiction is about.

WaffleLapkin commented 2 months ago

Let me start off by saying I think there is merit to the argument that the union variant MaybeUninit::uninit is technically valid to have, while NonZeroU8(0) is not.

I'm not exactly sure what you mean by that? Uninit is an invalid value for most types, similarly to how 0 is for non zero types. I don't see a difference?

However, assume_init is just "a random documented instance" of how reading uninit will cause UB, and, unless I'm mistaken, there is nothing special or magic about that method (in contrast to the type). There is no magic #[lang] or similar on it , and UB simply happens due to the self.value access:

Note that it's not just any access, it's a union field access. Accessing union fields is equivalent to a transmute and reinterprets the memory as another type. Since this produced an owned, typed value, it necessarily requires validity invariants to be upheld. One of u8's validity invariants is that it is not uninit.

You, once again, can do exactly the same with NonZero:

use std::num::NonZeroU8;

#[repr(u8)]
#[derive(Copy, Clone)]
enum ZeroU8 { Zero = 0u8 }

union MaybeZeroU8 {
    zero: ZeroU8,
    non_zero: NonZeroU8,
}

impl MaybeZeroU8 {
    fn zero() -> Self {
        Self { zero: ZeroU8::Zero }
    }

    unsafe fn assume_non_zero(self) -> NonZeroU8 {
        unsafe { self.non_zero }
    }
}

fn main() {
    let zero = MaybeZeroU8::zero();
    unsafe { zero.assume_non_zero() }; // UB
}

You can also make a copy of MaybeUninit and cause the same thing without #[lang] (proof). I'm not exactly sure why MaybeUninit is marked as #[lang]. Maybe it has special layout handling, or is used in some intrinsics or desugaring, etc. But either way, you don't have to use #[lang] to get your version of MaybeUninit with the same safety implications.

Likewise, reading the uninitialized value by other means also gives UB:

Transmute and union access are quite literally equivalent, so this is expected.

There is really only one way to get that variant MaybeUninit::uninit(), [...]

Not really! You can get uninit memory by reading padding bytes in a structure, for example[^1]:

fn main() {
    #[repr(C, align(2))]
    struct Padded {
        __: u8,
    }

    let p = Padded { __: 1 };
    let mu = unsafe {
        std::ptr::from_ref(&p)
            .cast::<core::mem::MaybeUninit<u8>>()
            .add(1)
            .read()
    };

    eprintln!("until this point miri doesn't complain, this is sound!");
    // (miri not complaining doesn't *guarantee* soundness, but I do believe this code is sound)

    // ...and this is not sound at all,
    // since it assumes uninit memory is init
    unsafe { mu.assume_init() };
}

[...], and once you set another value you can't really get uninit back; so it's not a variant you really work with. Conversely, doing anything meaningful with that variant (except getting rid of it) will have severe side effects, which is exactly what that depiction is about.

I'm not sure what you mean by "you can't really get uninit back" (you can certainly override an initialized MaybeUninit value with uninit memory). But I think what you are trying to get at here is that since uninit memory is an AM concept which is not present on most real hardware, there is no way to "observe" it when running a program. There is no way to check if a value is initialized or not.

Yet, I do not see how this makes it "cursed". You can't observe lots of stuff. Zero sized types for example. But I don't think zero sized types are cursed either?

Handling MaybeUninit certainly requires some care, just like any other unsafe-rust concept. But I think drawing it as "cursed" or "weird" unnecessarily mystifies it, unsafe code follows rules, just like anything else[^2]. I actually find MaybeUninit to be one of the least confusing and complicated topics in unsafe rust / rust opsem.

Ig what I'm trying to say, MaybeUninit::uninit is not "undefined" or "cursed", it behaves in exactly the way we decided and defined it to work. Including the fact that reinterpreting it as an integer type is UB.

[^1]: Exactly the same as what happens when you call MaybeUninit::uninit() mind you. It just creates a value with uninit: (), filling the rest of the value with padding.

[^2]: It is notable that the failure case of unsafe code is much more dangerous, but still, there are rules which codify how everything works.

ralfbiedert commented 2 months ago

I'm not exactly sure what you mean by that? Uninit is an invalid value for most types, similarly to how 0 is for non zero types. I don't see a difference?

I mean that MaybeUninit::<u8>::uninit() is valid Rust, which gives you an instance of the MaybeUninit<u8> type that's in the uninit state (meaning for all intents and purposes the uninit field / variant is the 'valid' one, while value is invalid), and you can pass that around and everything works. However, you can't create or have a NonZeroU8(0).

[...] Accessing union fields is equivalent to a transmute and reinterprets the memory as another type [...] Transmute and union access are quite literally equivalent, so this is expected.

Yes, agreed.

Not really! You can get uninit memory by reading padding bytes in a structure, for example

I disagree with that view. I mean, you are right that in this particular case you can summon a MaybeUninit<u8> because you read padding bytes at a well-known location that happens to be compatible with your u8. For pratically all other MaybeUninit<T> this won't work for size and alingment reasons.

I also disagree for ontological reasons. If I knew nothing of the inside of a MaybeUninit, it might carry some hidden information (like an ID or guard value for sake for argument). After you now called .write() on that MaybeUninit to make it valid, I don't see any method that could re-invalidate that very same MaybeUninit again via &mut self (apart from more transmute and pointer hacks).

But I think what you are trying to get at here is that since uninit memory is an AM concept which is not present on most real hardware, there is no way to "observe" it when running a program. There is no way to check if a value is initialized or not.

Yes! The target audience should have some familiarity with C-level concepts (memory, pointers) (compare FAQ). What I'm now trying to convey in that picture with 10 words of text or so:

As you say, if I had much more space I'd elaborate that undefined doesn't really exist anywhere as a property of space, but rather as a property of compilation, in that realistically (from an "machine code perspective") a program using a MaybeUninit<T> either has T memory reserved (plus some working code that just reads and writes that memory), or it is a miscompiled abomination that shouldn't have existed in the first place because the AM / compilation contract of not reading uninitialized memory was violated.

*) FWIW, I just realized that when I mean read I imply trying to do something meaningful with that value other than continuing treating it as an undefined value. While technically copying a struct with padding might-or-might-not involve copying (reading) some uninitialized bytes, that's an implementation detail in my book, not a read-access.

Yet, I do not see how this makes it "cursed". You can't observe lots of stuff. Zero sized types for example. But I don't think zero sized types are cursed either? [...] But I think drawing it as "cursed" or "weird" unnecessarily mystifies it, unsafe code follows rules, just like anything else2

I agree that there are other things that either might cause UB, or are unobservable. The cursed visualization in this particular case comes from that

Ig what I'm trying to say, MaybeUninit::uninit is not "undefined" or "cursed", it behaves in exactly the way we decided and defined it to work. Including the fact that reinterpreting it as an integer type is UB.

Yes, I agree with all of that, and I even agree that instead rendering it as uninit would be slightly more aligned with the type definition.

I think the bottom line of my point is that in the scope of type visualizations it's a reasonably appropriate warning about a severe issue that many people get wrong, while being correct enough as to showing the type layout and its primary behavior.