rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
98.35k stars 12.72k forks source link

Tracking issue for RFC 1892, "Deprecate uninitialized in favor of a new MaybeUninit type" #53491

Closed Centril closed 5 years ago

Centril commented 6 years ago

NEW TRACKING ISSUE = https://github.com/rust-lang/rust/issues/63566

This is a tracking issue for the RFC "Deprecate uninitialized in favor of a new MaybeUninit type" (rust-lang/rfcs#1892).

Steps:

Unresolved questions:

Kixunil commented 6 years ago

I'm curious to know whether there's some argument against layout-optimizing Option<Foo> where Foo is defined like this:

union Foo {
   bar: NonZeroUsize,
   baz: &'static str,
}
RalfJung commented 6 years ago

@Kixunil could you raise that at https://github.com/rust-rfcs/unsafe-code-guidelines/issues/13? Your question really is not related to MaybeUninit.

qwerty19106 commented 6 years ago

I want to know which section will be contains static variables without initialization? In "C" I can write uint8_t a[100]; in high level of file, and I know that a symbol will be put to .bss section. If I write uint8_t a[100] = {}; the a symbol will be put to .data section (which will be copied from FLASH to RAM before main).

It is small example in Rust which used MaybeUninit:

struct A {
    data: MaybeUninit<[u8; 100]>,
    len: usize,
}

impl A {
    pub const fn new() -> Self {
        Self {
            data: MaybeUninit::uninitialized(),
            len: 0,
        }
    }
}

static mut a: MaybeUninit<[u8; 100]> = MaybeUninit::uninitialized();
static mut b: A = A::new();

Which section will be contains a and b symbols?

P. S. I know about symbol mangling but it is no matter for this question.

japaric commented 6 years ago

@qwerty19106 In your example both a and b will be place in .bss. LLVM treats undef values, like MaybeUninit::uninitialized(), as zeros when selecting which section a variable will go in.

If A::new() initialized len to 1 then b would have ended in .data. If a static contains any non-zero value then the variable will go in .data. Padding is treated as zero values.

This is what LLVM does. Rust makes no ~guarantees~ promises (*) about which linker section a static variable will go in. It just inherits LLVM's behavior.

(*) Unless you use #[link_section]

Fun fact: At some point LLVM considered undef as a non-zero value so variable a in your example ended in .data. See #41315.

qwerty19106 commented 6 years ago

Thanks @japaric for your answer. It was very helpful for me.

I have the new idea. One can use .init_array section to initialize static mut variables before call main.

This is proof of concept:

#[macro_export]
macro_rules! static_singleton {
    ($name_var: ident, $ty:ty, $name_init_fn: ident, $name_init_var: ident, $init_block: block) => {
        static mut $name_var: MaybeUninit<$ty> = unsafe {MaybeUninit::uninitialized()};

        extern "C" fn $name_init_fn() {
            unsafe {
                $init_block
            }
        }

        #[link_section = ".init_array"]
        #[used]
        static $name_init_var: [extern "C" fn(); 1] = [$name_init_fn];
    };
}

The test code:

static_singleton!(A, u8, a_init_fn, A_INIT_VAR, {
    let ptr = A.get_mut();
    *ptr = 5;
});

fn main() {
    println!("A inited to {}", unsafe {&A.get_ref()});
}

Result: A inited to 5

Full example: playground

Unresolved question: I could not use concat_idents to generate a_init_fn and A_INIT_VAR. It seems that #1628 is not yet ready for use.

This test not very useful. But it can be useful in embedded for initialize complicated structs (it will be placed in .bss, thus allows to economy FLASH).

Why rustc don't use .init_array section? It is standartized section of ELF format (link).

eternaleye commented 6 years ago

@qwerty19106 Because life before main() is considered a misfeature and was explicitly disinvited from Rust's semantics.

qwerty19106 commented 6 years ago

Ok, it is good lang design.

But in #[no_std] we have not good alternative now (maybe I was searching bad).

We can use spin::Once, but it is very expensive (Ordering::SeqCst on every reference get).

I would like to have a compile-time check on embedded.

eddyb commented 6 years ago

it is very expensive (Ordering::SeqCst on every reference get).

That doesn't sound right to me. Aren't all "once" abstractions supposed to be relaxed on access, and synchronize on initialization? Or am I thinking of something else? cc @Amanieu @alexcrichton

eternaleye commented 6 years ago

@qwerty19106:

When you say "embedded", are you referring to bare-metal? It's worth noting that .init_array is not, in fact, part of the ELF format itself ¹ - It's not even part of the System V ABI ² that extends it; only .init is. You don't find .init_array until you get to the System V ABI draft update, which the Linux ABI inherits from.

As a result, if you are running on bare metal, .init_array may not even function reliably for your use case - after all, it's implemented on non-bare-metal by code in the dynamic loader and/or libc. Unless your bootloader takes responsibility for running code referenced in .init_array, it'd do nothing at all.

1: See page 28, figure 1-13 "Special Sections" 2: See page 63, figure 4-13 "Special Sections (continued)"

Amanieu commented 6 years ago

@eddyb You need an Acquire load at the very least when reading the Once. This is a normal load on x86 and a load + fence on ARM.

The current implementation uses load(SeqCst), but in practice this generates the same asm as load(Acquire) on all architectures.

RalfJung commented 6 years ago

(Would you mind moving these discussions elsewhere? They do not have anything to do with MaybeUninit vs mem::uninitialized any more. Both behave the same wrt what LLVM does - generate undef. What happens with that undef later is not on topic here.)

Am 13. September 2018 00:59:20 MESZ schrieb Amanieu notifications@github.com:

@eddyb You need an Acquire load at the very least when reading the Once. This is a normal load on x86 and a load + fence on ARM.

The current implementation uses load(SeqCst), but in practice this generates the same asm as load(Acquire) on all architectures.

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/rust-lang/rust/issues/53491#issuecomment-420825802

RalfJung commented 6 years ago

MaybeUninit has landed in master and will be in the next nightly. :)

SimonSapin commented 6 years ago

https://github.com/rust-lang/rust/issues/54470 proposes using Box<[MaybeUninit<T>]> in RawVec<T>. To enable this and possibly other interesting combinations with boxes and slices with fewer transmutes, maybe we could add some more APIs to the standard library?

In particular to allocate without initializing (I think Box::new(MaybeUninit::uninitialized()) would still copy size_of::<T>() padding bytes?):

impl<T> Box<MaybeUninit<T>> {
    pub fn new_uninit() -> Self {…}
    pub unsafe fn assert_init(s: Self) -> Box<T> { transmute(s) }
}

impl<T> Box<[MaybeUninit<T>]> {
    pub fn new_uninit_slice(len: usize) -> Self {…}
    pub unsafe fn assert_init(s: Self) -> Box<[T]> { transmute(s) }
}

In core::slice / std::slice, can be used after taking a sub-slice:

pub unsafe fn assert_init<T>(s: &[MaybeUninit<T>]) -> &[T] { transmute(s) }
pub unsafe fn assert_init_mut<T>(s: &mut [MaybeUninit<T>]) -> &mut [T] { transmute(s) }
RalfJung commented 6 years ago

I think Box::new(MaybeUninit::uninitialized()) would still copy size_of::() padding bytes

It should not, and there is a codegen test intended to test that.

Padding bytes do not have to be copied as their bit representation does not matter (anything that would observe the bit representation is UB anyway).

SimonSapin commented 6 years ago

Ok, so maybe Box::new_uninit is unnecessary? The slice version is different, though, since Box::new requires T: Sized.

mjbshaw commented 6 years ago

I'd like to advocate for MaybeUninit::zeroed to be a const fn. There are some FFI-related uses that I would have for it (e.g. a static that must be initialized to zero) and I believe others might find it useful. I would be happy to volunteer my time to const-ify the zeroed function.

Centril commented 6 years ago

@mjbshaw you'll need to use #[rustc_const_unstable(feature = "const_maybe_uninit_zeroed")] for that since zeroed does things that does not pass the min_const_fn check (https://github.com/rust-lang/rust/issues/53555) which means that MaybeUninit::zeroed's constness will be unstable even if the function is stable.

scottjmaddox commented 6 years ago

Could implementation/stabilization of this be split into a couple of steps, in order to make the MaybeUninit type available to the wider ecosystem sooner? The steps could be:

1) add MaybeUninit 2) convert all uses of mem::uninitialized/zeroed and deprecate

memoryruins commented 6 years ago

@scottjmaddox

add MaybeUninit

https://doc.rust-lang.org/nightly/core/mem/union.MaybeUninit.html :)

scottjmaddox commented 6 years ago

Nice! So is the plan to stabilize MaybeUninit ASAP?

RalfJung commented 6 years ago

Next step is to figure out why https://github.com/rust-lang/rust/pull/54668 regresses perf so badly (in a few benchmarks). I won't have much time to look at that this week though, would be happy if someone else could take a look. :D

Also I don't think we should rush this. We got the last API for handling uninitialized data wrong, let's not hurry and screw up again. ;)

That said, I'd also prefer not to add any unnecessary delays, so that we can finally deprecate the old footgun. :)

RalfJung commented 6 years ago

Oh, and something else just occurred to me... with https://github.com/rust-lang/rust/pull/54667 landed, the old APIs actually protect against some of the worst footguns. I wonder if we could get some of that for MaybeUninit as well? It's not blocking stabilization, but we could try to find a way to make MaybeUninit::into_inner panic when called on an uninhabited type. In debug builds, I could also imagine *x to panic when x: &[mut] T with T uninhabited.

RalfJung commented 6 years ago

Status update: To make progress with https://github.com/rust-lang/rust/pull/54668, we likely need someone to tweak layout computation for unions. @eddyb is willing to mentor, but we need someone to do the implementation. :)

shepmaster commented 6 years ago

I think a method that moves out of the wrapper, replacing it with an uninitialized value, would be useful:

pub unsafe fn take(&mut self) -> T

Shall I submit this?

Amanieu commented 6 years ago

@shepmaster This feels very similar to the existing into_inner method. Maybe we can try to avoid duplication here?

RalfJung commented 6 years ago

Also "replacing with" is likely the wrong picture here, this shouldn't change the content of self at all. Just ownership is transferred away so it is now effectively in the same state as when constructed uninitialized.

shepmaster commented 6 years ago

change the content of self at all

Sure, so the implementation would basically be ptr::read, but from a usage point of view I would encourage framing it as replacement of a valid value with an uninitialized value.

avoid duplication

I have no strong objection, as I expect the implementation of one to call the other. I just don't know what the final state would be.

RalfJung commented 5 years ago

I feel into_inner is a way too innocent function name. People, probably without reading the docs too carefully, still do MaybeUninit::uninitialized().into_inner(). Can we change the name to something like was_initialized_unchecked or so that indicates that you must only call this after the data has been initialized?

I think the same probably applies to take.

eddyb commented 5 years ago

While a bit awkward, something like unchecked_into_initialized might work?

SimonSapin commented 5 years ago

Or should those methods be removed entirely and docs give examples with x.as_ptr().read()?

RalfJung commented 5 years ago

@SimonSapin into_inner consumes self though which is nice.

But for @shepmaster's take, doing as_mut_ptr().read() would do the same... though of course then why would you even bother with a mutable pointer?

jethrogb commented 5 years ago

How about take_unchecked and into_inner_unchecked?

RalfJung commented 5 years ago

That would be a backup plan I guess, but I'd prefer if it could indicate that you must have initialized.

hanna-kruppe commented 5 years ago

Putting both the emphasis that it has to be initialized and a description of what it does (unwrap/into_inner/etc.) into one name gets rather unwieldy, so how about just doing the former with assert_initialized and leaving the latter implied by the signature? Possible unchecked_assert_initialized to avoid implying a runtime check like assert!() has.

oli-obk commented 5 years ago

Possible unchecked_assert_initialized to avoid implying a runtime check like assert!() has.

We differentiate between assumptions and assertions already via intrinsics::assume(foo) vs assert!(foo), so maybe assume_initialized?

RalfJung commented 5 years ago

assume is unstable API, a stable example of assume vs assert is unreachable_unchecked vs unreachable and get_unchecked vs get. So I think unchecked is the right term.

shepmaster commented 5 years ago

I'd say that foo_unchecked only makes sense when there is a corresponding foo, otherwise the sheer nature of the function being unsafe indicates to me that something "different" is going on.

This bikeshed is clearly the wrong color

hanna-kruppe commented 5 years ago

With this specific API, we've already seen and will continue to see programmers assume that the unsafety is because "uninitialized data is garbage so you can cause UB if you handle it carelessly", rather than the intended "it is UB to call this on uninitialized data, period". I don't know for sure whether an arguably redundant ⚠️ like unchecked will help with that but I'd rather err on the side of being more perplexing (= more likely to cause people to ask around or read the docs very carefully).

Centril commented 5 years ago

@RalfJung

I feel into_inner is a way too innocent function name. People, probably without reading the docs too carefully, still do MaybeUninit::uninitialized().into_inner(). Can we change the name to something like was_initialized_unchecked or so that indicates that you must only call this after the data has been initialized?

I really like this idea; I feel strongly that it says the right thing both about the semantics and that this is potentially dangerous.

@rkruppe

Putting both the emphasis that it has to be initialized and a description of what it does (unwrap/into_inner/etc.) into one name gets rather unwieldy, so how about just doing the former with assert_initialized and leaving the latter implied by the signature? Possible unchecked_assert_initialized to avoid implying a runtime check like assert!() has.

I have no qualms about unwieldy long-ass names for dangerous things. If it makes it more people think twice then even was_initialized_into_inner_unchecked is entirely fine by me. Making it unergonomic (within reason) to write unsafe code is a feature, not a bug ;)

shepmaster commented 5 years ago

Remember that a vast majority of people will likely be using an IDE with some form of autocomplete, so a long name is a minor roadbump.

hanna-kruppe commented 5 years ago

I do not particularly care for the ergonomics of using this function, but I do think past a certain point names tend to be skimmed rather than read, and this name really should be read to understand what's happening. Additionally, I expect this function will be discussed/explained almost as often as it's actually used (since it's relatively niche and very subtle), and while typing long identifiers in source code can be fine (e.g. thanks to IDEs), typing them out from memory in a chat system is... less nice (I'm half joking about this point, but only half).

Centril commented 5 years ago

@shepmaster Sure; I use an IDE with auto completion as well; but I think a longer name with unchecked inside including inside an unsafe block would give at lest myself some extra pause.

@rkruppe

typing them out from memory in a chat system is... less nice (I'm half joking about this point, but only half).

I would make that trade-off. If a name is a bit special that can even make it more memorable. ;)

Any of (or similar names that include the same semantic connotations):

are fine by me.

Boddlnagg commented 5 years ago

What about initialized_into_inner? Or initialized_into_inner_unchecked, if you think that unchecked is really necessary, though I tend to agree with @shepmaster that unchecked is only necessary to distinguish from some other checked variant of the same functionality, where runtime checks are not happening.

Nemo157 commented 5 years ago

When manually implementing a self-borrowing generator I ended up using ptr::drop_in_place(maybe_uninit.as_mut_ptr()) multiple times, it seems like this would work well as an inherent method unsafe fn drop_in_place(&mut self) on MaybeUninit.

SimonSapin commented 5 years ago

There is precedent with ManuallyDrop::drop.

RalfJung commented 5 years ago

I'd say that foo_unchecked only makes sense when there is a corresponding foo, otherwise the sheer nature of the function being unsafe indicates to me that something "different" is going on.

I don't think not having a safe version is a good reason to remove the warning sign from the unsafe version.

shepmaster commented 5 years ago

remove the warning sign from the unsafe version

Being a tad hyperbolic, when should an unsafe function not have _unchecked stuck on the end then? What's the point of having two warnings that say the same thing?

RalfJung commented 5 years ago

That's a fair question. :) But I think the answer is "almost never", and I actually regret that we have offset as unsafe function on pointers which in no way expresses that it is unsafe. It doesn't have to be literally unchecked, but IMO there should be something. When I am in an unsafe block anyway and accidentally write .offset instead of .wrapping_offset, I made a promise to the compiler that I did not intend to make.

shepmaster commented 5 years ago

as unsafe function on pointers which in no way expresses that it is unsafe

This sums up my bemusement at this stage.

RalfJung commented 5 years ago

@shepmaster so you don't think it is realistic that someone will edit code inside an existing unsafe block (maybe a large one, maybe inside a large unsafe fn which implicitly has an unsafe block), and not be aware that the call they are adding is unsafe?