Closed WaffleLapkin closed 3 months ago
Maybe I misunderstand your point, but the uninit
variant is certainly "cursed", as merely accessing it unpredictably does things that the related T
(e.g., u8
) would not, including crashing your application:
use std::mem::MaybeUninit;
fn main() {
let x = MaybeUninit::<u8>::uninit();
unsafe {
if x.assume_init() == 0 {
dbg!("x is zero");
}
}
}
gives on my machine with Rust 1.81-nightly
C:/Users/rb/.cargo/bin/cargo.exe run --color=always --package endianess --bin endianess --release
Finished `release` profile [optimized] target(s) in 0.01s
Running `target\release\endianess.exe`
error: process didn't exit successfully: `target\release\endianess.exe` (exit code: 0xc000001d, STATUS_ILLEGAL_INSTRUCTION)
Process finished with exit code -1073741795 (0xC000001D)
In C lingo, the result of accessing a MaybeUninit
that is in the uninit
state is undefined, as from that moment on, no sensible prediction about the entire application can be made anymore.
Closing due to inactivity.
Maybe I misunderstand your point, but the
uninit
variant is certainly "cursed", as merely accessing it unpredictably does things that the relatedT
(e.g.,u8
) would not, including crashing your application:
Calling assume_init
on a MaybeUninit
which is uninit
is undefined behavior, that undefined behavior is the reason of the "cursed" behavior, not the variant itself.
Your example is no different to this:
fn main() {
let x = 0;
unsafe {
if std::num::NonZeroU8::new_unchecked(0) == 1 {
dbg!("x is one");
}
}
}
Here NonZeroU8::new_unchecked
's safety contract doesn't allow to call it with a 0
. Just like MaybeUninit::assume_init
can't be called with MaybeUninit::uninit()
.
assume_init
is not merely an "access".
Let me start off by saying I think there is merit to the argument that the union variant MaybeUninit::uninit
is technically valid to have, while NonZeroU8(0)
is not.
However, assume_init
is just "a random documented instance" of how reading uninit
will cause UB, and, unless I'm mistaken, there is nothing special or magic about that method (in contrast to the type). There is no magic #[lang]
or similar on it , and UB simply happens due to the self.value
access:
pub const unsafe fn assume_init(self) -> T {
unsafe {
intrinsics::assert_inhabited::<T>();
ManuallyDrop::into_inner(self.value)
}
}
Likewise, reading the uninitialized value by other means also gives UB:
use std::mem::{transmute, MaybeUninit};
fn main() {
let x: u8 = {
let m = MaybeUninit::<u8>::uninit();
unsafe { transmute(m) }
};
if x == 0 {
dbg!(x); // Crash
}
}
Now from the perspective of this cheat sheet and a condensed info graphic, I still think the current depiction is better than the alternative of showing it as merely any other variant:
There is really only one way to get that variant MaybeUninit::uninit()
, and once you set another value you can't really get uninit
back; so it's not a variant you really work with. Conversely, doing anything meaningful with that variant (except getting rid of it) will have severe side effects, which is exactly what that depiction is about.
Let me start off by saying I think there is merit to the argument that the union variant
MaybeUninit::uninit
is technically valid to have, whileNonZeroU8(0)
is not.
I'm not exactly sure what you mean by that? Uninit is an invalid value for most types, similarly to how 0
is for non zero types. I don't see a difference?
However,
assume_init
is just "a random documented instance" of how readinguninit
will cause UB, and, unless I'm mistaken, there is nothing special or magic about that method (in contrast to the type). There is no magic#[lang]
or similar on it , and UB simply happens due to theself.value
access:
Note that it's not just any access, it's a union field access. Accessing union fields is equivalent to a transmute
and reinterprets the memory as another type. Since this produced an owned, typed value, it necessarily requires validity invariants to be upheld. One of u8
's validity invariants is that it is not uninit
.
You, once again, can do exactly the same with NonZero
:
use std::num::NonZeroU8;
#[repr(u8)]
#[derive(Copy, Clone)]
enum ZeroU8 { Zero = 0u8 }
union MaybeZeroU8 {
zero: ZeroU8,
non_zero: NonZeroU8,
}
impl MaybeZeroU8 {
fn zero() -> Self {
Self { zero: ZeroU8::Zero }
}
unsafe fn assume_non_zero(self) -> NonZeroU8 {
unsafe { self.non_zero }
}
}
fn main() {
let zero = MaybeZeroU8::zero();
unsafe { zero.assume_non_zero() }; // UB
}
You can also make a copy of MaybeUninit
and cause the same thing without #[lang]
(proof). I'm not exactly sure why MaybeUninit
is marked as #[lang]
. Maybe it has special layout handling, or is used in some intrinsics or desugaring, etc. But either way, you don't have to use #[lang]
to get your version of MaybeUninit
with the same safety implications.
Likewise, reading the uninitialized value by other means also gives UB:
Transmute and union access are quite literally equivalent, so this is expected.
There is really only one way to get that variant
MaybeUninit::uninit()
, [...]
Not really! You can get uninit memory by reading padding bytes in a structure, for example[^1]:
fn main() {
#[repr(C, align(2))]
struct Padded {
__: u8,
}
let p = Padded { __: 1 };
let mu = unsafe {
std::ptr::from_ref(&p)
.cast::<core::mem::MaybeUninit<u8>>()
.add(1)
.read()
};
eprintln!("until this point miri doesn't complain, this is sound!");
// (miri not complaining doesn't *guarantee* soundness, but I do believe this code is sound)
// ...and this is not sound at all,
// since it assumes uninit memory is init
unsafe { mu.assume_init() };
}
[...], and once you set another value you can't really get uninit back; so it's not a variant you really work with. Conversely, doing anything meaningful with that variant (except getting rid of it) will have severe side effects, which is exactly what that depiction is about.
I'm not sure what you mean by "you can't really get uninit back" (you can certainly override an initialized MaybeUninit
value with uninit memory). But I think what you are trying to get at here is that since uninit memory is an AM concept which is not present on most real hardware, there is no way to "observe" it when running a program. There is no way to check if a value is initialized or not.
Yet, I do not see how this makes it "cursed". You can't observe lots of stuff. Zero sized types for example. But I don't think zero sized types are cursed either?
Handling MaybeUninit
certainly requires some care, just like any other unsafe-rust concept. But I think drawing it as "cursed" or "weird" unnecessarily mystifies it, unsafe code follows rules, just like anything else[^2]. I actually find MaybeUninit
to be one of the least confusing and complicated topics in unsafe rust / rust opsem.
Ig what I'm trying to say, MaybeUninit::uninit
is not "undefined" or "cursed", it behaves in exactly the way we decided and defined it to work. Including the fact that reinterpreting it as an integer type is UB.
[^1]: Exactly the same as what happens when you call MaybeUninit::uninit()
mind you. It just creates a value with uninit: ()
, filling the rest of the value with padding.
[^2]: It is notable that the failure case of unsafe code is much more dangerous, but still, there are rules which codify how everything works.
I'm not exactly sure what you mean by that? Uninit is an invalid value for most types, similarly to how 0 is for non zero types. I don't see a difference?
I mean that MaybeUninit::<u8>::uninit()
is valid Rust, which gives you an instance of the MaybeUninit<u8>
type that's in the uninit
state (meaning for all intents and purposes the uninit
field / variant is the 'valid' one, while value
is invalid), and you can pass that around and everything works. However, you can't create or have a NonZeroU8(0)
.
[...] Accessing union fields is equivalent to a transmute and reinterprets the memory as another type [...] Transmute and union access are quite literally equivalent, so this is expected.
Yes, agreed.
Not really! You can get uninit memory by reading padding bytes in a structure, for example
I disagree with that view. I mean, you are right that in this particular case you can summon a MaybeUninit<u8>
because you read padding bytes at a well-known location that happens to be compatible with your u8
. For pratically all other MaybeUninit<T>
this won't work for size and alingment reasons.
I also disagree for ontological reasons. If I knew nothing of the inside of a MaybeUninit
, it might carry some hidden information (like an ID or guard value for sake for argument). After you now called .write()
on that MaybeUninit to make it valid, I don't see any method that could re-invalidate that very same MaybeUninit
again via &mut self
(apart from more transmute and pointer hacks).
But I think what you are trying to get at here is that since uninit memory is an AM concept which is not present on most real hardware, there is no way to "observe" it when running a program. There is no way to check if a value is initialized or not.
Yes! The target audience should have some familiarity with C-level concepts (memory, pointers) (compare FAQ). What I'm now trying to convey in that picture with 10 words of text or so:
MaybeUninit<T>
T
(size-wise)T
inside, in which case everything is dandyT
inside, but you tried to read-access(*) it anyways, something "undefined" will happen. I use the word "undefined" in particular because I assume C people are at least vaguely familiar with it and its consequences, whereas the term "uninitialized" is more vague and overloaded.As you say, if I had much more space I'd elaborate that undefined doesn't really exist anywhere as a property of space, but rather as a property of compilation, in that realistically (from an "machine code perspective") a program using a MaybeUninit<T>
either has T
memory reserved (plus some working code that just reads and writes that memory), or it is a miscompiled abomination that shouldn't have existed in the first place because the AM / compilation contract of not reading uninitialized memory was violated.
*) FWIW, I just realized that when I mean read I imply trying to do something meaningful with that value other than continuing treating it as an undefined value. While technically copying a struct with padding might-or-might-not involve copying (reading) some uninitialized bytes, that's an implementation detail in my book, not a read-access.
Yet, I do not see how this makes it "cursed". You can't observe lots of stuff. Zero sized types for example. But I don't think zero sized types are cursed either? [...] But I think drawing it as "cursed" or "weird" unnecessarily mystifies it, unsafe code follows rules, just like anything else2
I agree that there are other things that either might cause UB, or are unobservable. The cursed visualization in this particular case comes from that
std
,Ig what I'm trying to say, MaybeUninit::uninit is not "undefined" or "cursed", it behaves in exactly the way we decided and defined it to work. Including the fact that reinterpreting it as an integer type is UB.
Yes, I agree with all of that, and I even agree that instead rendering it as uninit
would be slightly more aligned with the type definition.
I think the bottom line of my point is that in the scope of type visualizations it's a reasonably appropriate warning about a severe issue that many people get wrong, while being correct enough as to showing the type layout and its primary behavior.
MaybeUninit
does not have an " Undefine̻̅ḓ̓ " variant (as a side note, this might be terrible for screen readers / accessibility anyway). It has an "uninit" variant.I would say that is an important distinction, "undefined" seems mysterious and possibly not reproducible (i.e. "undefined" value may have different value each time it is observed), but actually "uninit" is just a single value that exists in the Rust Abstract Machine :tm: (if I'm not mistaken a "byte" is defined as either
uninit
or a0..=255
integer value).So while you don't know how an
uninit
variant is represented on the hardware, this does not actually matter, since you can't introspect it from within the abstract machine.uninit
is just a value, which is invalid for most operations.So I think explaining it as "undefined" and "cursed" is not the best way.
(P.S. haven't done much with unsafe lately, so I'm not sure if terminology is correct here, but I believe the idea is)