Closed Centril closed 5 years ago
I'm curious to know whether there's some argument against layout-optimizing Option<Foo>
where Foo
is defined like this:
union Foo {
bar: NonZeroUsize,
baz: &'static str,
}
@Kixunil could you raise that at https://github.com/rust-rfcs/unsafe-code-guidelines/issues/13? Your question really is not related to MaybeUninit
.
I want to know which section will be contains static variables without initialization?
In "C" I can write uint8_t a[100];
in high level of file, and I know that a symbol will be put to .bss section. If I write uint8_t a[100] = {};
the a symbol will be put to .data section (which will be copied from FLASH to RAM before main).
It is small example in Rust which used MaybeUninit:
struct A {
data: MaybeUninit<[u8; 100]>,
len: usize,
}
impl A {
pub const fn new() -> Self {
Self {
data: MaybeUninit::uninitialized(),
len: 0,
}
}
}
static mut a: MaybeUninit<[u8; 100]> = MaybeUninit::uninitialized();
static mut b: A = A::new();
Which section will be contains a and b symbols?
P. S. I know about symbol mangling but it is no matter for this question.
@qwerty19106 In your example both a
and b
will be place in .bss
. LLVM treats undef
values, like MaybeUninit::uninitialized()
, as zeros when selecting which section a variable will go in.
If A::new()
initialized len
to 1 then b
would have ended in .data
. If a static
contains any non-zero value then the variable will go in .data
. Padding is treated as zero values.
This is what LLVM does. Rust makes no ~guarantees~ promises (*) about which linker section a static
variable will go in. It just inherits LLVM's behavior.
(*) Unless you use #[link_section]
Fun fact: At some point LLVM considered undef
as a non-zero value so variable a
in your example ended in .data
. See #41315.
Thanks @japaric for your answer. It was very helpful for me.
I have the new idea. One can use .init_array section to initialize static mut variables before call main.
This is proof of concept:
#[macro_export]
macro_rules! static_singleton {
($name_var: ident, $ty:ty, $name_init_fn: ident, $name_init_var: ident, $init_block: block) => {
static mut $name_var: MaybeUninit<$ty> = unsafe {MaybeUninit::uninitialized()};
extern "C" fn $name_init_fn() {
unsafe {
$init_block
}
}
#[link_section = ".init_array"]
#[used]
static $name_init_var: [extern "C" fn(); 1] = [$name_init_fn];
};
}
The test code:
static_singleton!(A, u8, a_init_fn, A_INIT_VAR, {
let ptr = A.get_mut();
*ptr = 5;
});
fn main() {
println!("A inited to {}", unsafe {&A.get_ref()});
}
Result: A inited to 5
Full example: playground
Unresolved question: I could not use concat_idents to generate a_init_fn and A_INIT_VAR. It seems that #1628 is not yet ready for use.
This test not very useful. But it can be useful in embedded for initialize complicated structs (it will be placed in .bss, thus allows to economy FLASH).
Why rustc don't use .init_array section? It is standartized section of ELF format (link).
@qwerty19106 Because life before main() is considered a misfeature and was explicitly disinvited from Rust's semantics.
Ok, it is good lang design.
But in #[no_std] we have not good alternative now (maybe I was searching bad).
We can use spin::Once, but it is very expensive (Ordering::SeqCst on every reference get).
I would like to have a compile-time check on embedded.
it is very expensive (
Ordering::SeqCst
on every reference get).
That doesn't sound right to me. Aren't all "once" abstractions supposed to be relaxed on access, and synchronize on initialization? Or am I thinking of something else? cc @Amanieu @alexcrichton
@qwerty19106:
When you say "embedded", are you referring to bare-metal? It's worth noting that .init_array
is not, in fact, part of the ELF format itself ¹ - It's not even part of the System V ABI ² that extends it; only .init
is. You don't find .init_array
until you get to the System V ABI draft update, which the Linux ABI inherits from.
As a result, if you are running on bare metal, .init_array
may not even function reliably for your use case - after all, it's implemented on non-bare-metal by code in the dynamic loader and/or libc. Unless your bootloader takes responsibility for running code referenced in .init_array
, it'd do nothing at all.
1: See page 28, figure 1-13 "Special Sections" 2: See page 63, figure 4-13 "Special Sections (continued)"
@eddyb You need an Acquire
load at the very least when reading the Once
. This is a normal load on x86 and a load + fence on ARM.
The current implementation uses load(SeqCst)
, but in practice this generates the same asm as load(Acquire)
on all architectures.
(Would you mind moving these discussions elsewhere? They do not have anything to do with MaybeUninit vs mem::uninitialized any more. Both behave the same wrt what LLVM does - generate undef. What happens with that undef later is not on topic here.)
Am 13. September 2018 00:59:20 MESZ schrieb Amanieu notifications@github.com:
@eddyb You need an
Acquire
load at the very least when reading theOnce
. This is a normal load on x86 and a load + fence on ARM.The current implementation uses
load(SeqCst)
, but in practice this generates the same asm asload(Acquire)
on all architectures.-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/rust-lang/rust/issues/53491#issuecomment-420825802
MaybeUninit
has landed in master and will be in the next nightly. :)
https://github.com/rust-lang/rust/issues/54470 proposes using Box<[MaybeUninit<T>]>
in RawVec<T>
. To enable this and possibly other interesting combinations with boxes and slices with fewer transmutes, maybe we could add some more APIs to the standard library?
In particular to allocate without initializing (I think Box::new(MaybeUninit::uninitialized())
would still copy size_of::<T>()
padding bytes?):
impl<T> Box<MaybeUninit<T>> {
pub fn new_uninit() -> Self {…}
pub unsafe fn assert_init(s: Self) -> Box<T> { transmute(s) }
}
impl<T> Box<[MaybeUninit<T>]> {
pub fn new_uninit_slice(len: usize) -> Self {…}
pub unsafe fn assert_init(s: Self) -> Box<[T]> { transmute(s) }
}
In core::slice
/ std::slice
, can be used after taking a sub-slice:
pub unsafe fn assert_init<T>(s: &[MaybeUninit<T>]) -> &[T] { transmute(s) }
pub unsafe fn assert_init_mut<T>(s: &mut [MaybeUninit<T>]) -> &mut [T] { transmute(s) }
I think Box::new(MaybeUninit::uninitialized()) would still copy size_of::
() padding bytes
It should not, and there is a codegen test intended to test that.
Padding bytes do not have to be copied as their bit representation does not matter (anything that would observe the bit representation is UB anyway).
Ok, so maybe Box::new_uninit
is unnecessary? The slice version is different, though, since Box::new
requires T: Sized
.
I'd like to advocate for MaybeUninit::zeroed
to be a const fn
. There are some FFI-related uses that I would have for it (e.g. a static that must be initialized to zero) and I believe others might find it useful. I would be happy to volunteer my time to const-ify the zeroed
function.
@mjbshaw you'll need to use #[rustc_const_unstable(feature = "const_maybe_uninit_zeroed")]
for that since zeroed
does things that does not pass the min_const_fn
check (https://github.com/rust-lang/rust/issues/53555) which means that MaybeUninit::zeroed
's constness will be unstable even if the function is stable.
Could implementation/stabilization of this be split into a couple of steps, in order to make the MaybeUninit type available to the wider ecosystem sooner? The steps could be:
1) add MaybeUninit 2) convert all uses of mem::uninitialized/zeroed and deprecate
Nice! So is the plan to stabilize MaybeUninit ASAP?
Next step is to figure out why https://github.com/rust-lang/rust/pull/54668 regresses perf so badly (in a few benchmarks). I won't have much time to look at that this week though, would be happy if someone else could take a look. :D
Also I don't think we should rush this. We got the last API for handling uninitialized data wrong, let's not hurry and screw up again. ;)
That said, I'd also prefer not to add any unnecessary delays, so that we can finally deprecate the old footgun. :)
Oh, and something else just occurred to me... with https://github.com/rust-lang/rust/pull/54667 landed, the old APIs actually protect against some of the worst footguns. I wonder if we could get some of that for MaybeUninit
as well? It's not blocking stabilization, but we could try to find a way to make MaybeUninit::into_inner
panic when called on an uninhabited type. In debug builds, I could also imagine *x
to panic when x: &[mut] T
with T
uninhabited.
Status update: To make progress with https://github.com/rust-lang/rust/pull/54668, we likely need someone to tweak layout computation for unions. @eddyb is willing to mentor, but we need someone to do the implementation. :)
I think a method that moves out of the wrapper, replacing it with an uninitialized value, would be useful:
pub unsafe fn take(&mut self) -> T
Shall I submit this?
@shepmaster This feels very similar to the existing into_inner
method. Maybe we can try to avoid duplication here?
Also "replacing with" is likely the wrong picture here, this shouldn't change the content of self
at all. Just ownership is transferred away so it is now effectively in the same state as when constructed uninitialized.
change the content of
self
at all
Sure, so the implementation would basically be ptr::read
, but from a usage point of view I would encourage framing it as replacement of a valid value with an uninitialized value.
avoid duplication
I have no strong objection, as I expect the implementation of one to call the other. I just don't know what the final state would be.
I feel into_inner
is a way too innocent function name. People, probably without reading the docs too carefully, still do MaybeUninit::uninitialized().into_inner()
. Can we change the name to something like was_initialized_unchecked
or so that indicates that you must only call this after the data has been initialized?
I think the same probably applies to take
.
While a bit awkward, something like unchecked_into_initialized
might work?
Or should those methods be removed entirely and docs give examples with x.as_ptr().read()
?
@SimonSapin into_inner
consumes self
though which is nice.
But for @shepmaster's take
, doing as_mut_ptr().read()
would do the same... though of course then why would you even bother with a mutable pointer?
How about take_unchecked
and into_inner_unchecked
?
That would be a backup plan I guess, but I'd prefer if it could indicate that you must have initialized.
Putting both the emphasis that it has to be initialized and a description of what it does (unwrap/into_inner/etc.) into one name gets rather unwieldy, so how about just doing the former with assert_initialized
and leaving the latter implied by the signature? Possible unchecked_assert_initialized
to avoid implying a runtime check like assert!()
has.
Possible unchecked_assert_initialized to avoid implying a runtime check like assert!() has.
We differentiate between assumptions and assertions already via intrinsics::assume(foo)
vs assert!(foo)
, so maybe assume_initialized
?
assume
is unstable API, a stable example of assume vs assert is unreachable_unchecked
vs unreachable
and get_unchecked
vs get
. So I think unchecked
is the right term.
I'd say that foo_unchecked
only makes sense when there is a corresponding foo
, otherwise the sheer nature of the function being unsafe
indicates to me that something "different" is going on.
This bikeshed is clearly the wrong color
With this specific API, we've already seen and will continue to see programmers assume that the unsafety is because "uninitialized data is garbage so you can cause UB if you handle it carelessly", rather than the intended "it is UB to call this on uninitialized data, period". I don't know for sure whether an arguably redundant ⚠️ like unchecked
will help with that but I'd rather err on the side of being more perplexing (= more likely to cause people to ask around or read the docs very carefully).
@RalfJung
I feel
into_inner
is a way too innocent function name. People, probably without reading the docs too carefully, still doMaybeUninit::uninitialized().into_inner()
. Can we change the name to something likewas_initialized_unchecked
or so that indicates that you must only call this after the data has been initialized?
I really like this idea; I feel strongly that it says the right thing both about the semantics and that this is potentially dangerous.
@rkruppe
Putting both the emphasis that it has to be initialized and a description of what it does (unwrap/into_inner/etc.) into one name gets rather unwieldy, so how about just doing the former with
assert_initialized
and leaving the latter implied by the signature? Possibleunchecked_assert_initialized
to avoid implying a runtime check likeassert!()
has.
I have no qualms about unwieldy long-ass names for dangerous things. If it makes it more people think twice then even was_initialized_into_inner_unchecked
is entirely fine by me. Making it unergonomic (within reason) to write unsafe code is a feature, not a bug ;)
Remember that a vast majority of people will likely be using an IDE with some form of autocomplete, so a long name is a minor roadbump.
I do not particularly care for the ergonomics of using this function, but I do think past a certain point names tend to be skimmed rather than read, and this name really should be read to understand what's happening. Additionally, I expect this function will be discussed/explained almost as often as it's actually used (since it's relatively niche and very subtle), and while typing long identifiers in source code can be fine (e.g. thanks to IDEs), typing them out from memory in a chat system is... less nice (I'm half joking about this point, but only half).
@shepmaster Sure; I use an IDE with auto completion as well; but I think a longer name with unchecked
inside including inside an unsafe
block would give at lest myself some extra pause.
@rkruppe
typing them out from memory in a chat system is... less nice (I'm half joking about this point, but only half).
I would make that trade-off. If a name is a bit special that can even make it more memorable. ;)
Any of (or similar names that include the same semantic connotations):
was_initialized_unchecked
was_initialized_into_inner_unchecked
is_initialized_unchecked
is_initialized_into_inner_unchecked
was_init_unchecked
was_init_into_inner_unchecked
is_init_unchecked
is_init_into_inner_unchecked
assume_initialized_unchecked
assume_init_unchecked
are fine by me.
What about initialized_into_inner
? Or initialized_into_inner_unchecked
, if you think that unchecked
is really necessary, though I tend to agree with @shepmaster that unchecked
is only necessary to distinguish from some other checked variant of the same functionality, where runtime checks are not happening.
When manually implementing a self-borrowing generator I ended up using ptr::drop_in_place(maybe_uninit.as_mut_ptr())
multiple times, it seems like this would work well as an inherent method unsafe fn drop_in_place(&mut self)
on MaybeUninit
.
There is precedent with ManuallyDrop::drop
.
I'd say that foo_unchecked only makes sense when there is a corresponding foo, otherwise the sheer nature of the function being unsafe indicates to me that something "different" is going on.
I don't think not having a safe version is a good reason to remove the warning sign from the unsafe version.
remove the warning sign from the unsafe version
Being a tad hyperbolic, when should an unsafe
function not have _unchecked
stuck on the end then? What's the point of having two warnings that say the same thing?
That's a fair question. :) But I think the answer is "almost never", and I actually regret that we have offset
as unsafe function on pointers which in no way expresses that it is unsafe. It doesn't have to be literally unchecked
, but IMO there should be something. When I am in an unsafe block anyway and accidentally write .offset
instead of .wrapping_offset
, I made a promise to the compiler that I did not intend to make.
as unsafe function on pointers which in no way expresses that it is unsafe
This sums up my bemusement at this stage.
@shepmaster so you don't think it is realistic that someone will edit code inside an existing unsafe
block (maybe a large one, maybe inside a large unsafe fn
which implicitly has an unsafe
block), and not be aware that the call they are adding is unsafe
?
NEW TRACKING ISSUE = https://github.com/rust-lang/rust/issues/63566
This is a tracking issue for the RFC "Deprecate
uninitialized
in favor of a newMaybeUninit
type" (rust-lang/rfcs#1892).Steps:
Unresolved questions:
Should we have a safe setter that returns an&mut T
?Should we renameMaybeUninit
? (https://github.com/rust-lang/rust/pull/56138)Should we renameinto_inner
? Should it be more liketake
instead and take&mut self
?ShouldMaybeUninit<T>
beCopy
forT: Copy
?get_ref
andget_mut
(but not reading from the returned references) before data got initialized? (AKA: "Are references to uninitialized data insta-UB, or only UB when being read from?") Should we rename it similar tointo_inner
?Can we make(done)into_inner
(or whatever it ends up being called) panic whenT
is uninhabited, likemem::uninitialized
does currently?Seems like we want to not deprecatemem::zeroed
. We should however remember to also update its documentation together withMaybeUninit
, make sure people are aware that this is insta-UB if all-0-bits does not satisfy the type's validity invariant.