Open gnzlbg opened 4 years ago
I am very against this because it would greatly complicate the unsafe code writing experience for users of the language.
Not saying it should be done. but will it?
because right now the "recommended" way of creating a slice is through slice::from_raw_parts
which users right now care about the layout of slice?
The internals of the slice aren't what would complicate it, the fact that a slice is no longer always two usize
is what makes it more complicated.
Currently, it's an easy rule for people to remember: one usize for normal pointers, and two for slice and trait object pointers. But then if this optimization were put in then every once in a while a slice pointer would be one usize instead of the expected two.
Not the end of the world, but it adds complexity to all cases to optimize what I suspect is quite rare.
Note that multi-trait objects might be larger than two usize
s, so the rule isn't already as simple as:
one usize for normal pointers, and two for slice and trait object pointers.
(and this is without taking into account custom DSTs, which can be arbitrarily large)
sigh oh
well if it's already complex like that then it doesn't hurt extra i guess.
Related rejected RFC: https://github.com/rust-lang/rfcs/pull/2040 (for thin references rather than slices, which changes the tradeoffs a bit)
I agree that this is an unnecessary burden for unsafe code authors, and more generally inelegant. In the framework of custom DSTs, a "pointer to T
" (T sized or DST) composed of a thin pointer to the start of the object and metadata which tells us more about the layout of the object (in the Sized case, no runtime data is necessary to know the layout, so the metadata has zero size and thin pointers follow naturally). The present proposal would add an exception to that (for the case where "pointer" = reference and T
= [U]
where U
is ZST).
This means that the pointer layout changes not only with the pointee type, but also with the pointer type constructor. In particular, it means that type punning from (something containing a) &T
to (something containing a) *const T
and back is no longer possible. That seems like a major annoyance for writing certain kinds of unsafe code, and as @Lokathor wrote it seems to only help in very niche cases.
This specific objection could be avoided by changing all pointer types in the same way, but I think that's even less likely to be acceptable since it would also affect raw pointers.
@rkruppe Thanks for the link to that RFC. I think most of the arguments made there apply here. This comment by @nikomatsakis is a good summary.
In the framework of custom DSTs, a "pointer to T" (T sized or DST) composed of a thin pointer to the start of the object and metadata
This would require a more "general" and complicated custom DST framework to support these cases, which is unlikely since the current proposal is already complicated enough.
That RFC discussion suggest that a better solution might be to offer this via a library type, e.g., maybe something like a core::ptr::CompressedNonNull<T>
that's similar to a *mut T
but always non-null and if T
is a ZST then it does not store a pointer address.
That would allow, e.g., a type like Vec<T>
, to be potentially smaller when T
is a ZST (e.g. 1 word instead of 3 words) if that makes sense for the type (this might not make sense for Vec
because libstd might want to promise that its layout isn't "clever").
I think libstd very clearly already promises that Vec's layout isn't clever.
I think libstd very clearly already promises that Vec's layout isn't clever.
@Lokathor do you have a link to where libstd makes this guarantee? AFAICT we do guarantee certain cleverness for ZSTs (e.g. that we never allocate), and this result in different behavior (e.g. the capacity of Vec::new()
being different than 0). We also do provide certain "clever" layout optimizations, like Option<Vec<T>>
having the same size as Vec<T>
.
EDIT: That section of libstd does say:
Most fundamentally, Vec is and always will be a (pointer, capacity, length) triplet.
but it's unclear to me at least from that guarantee whether this means that Vec
is guaranteed to be 3 words wide (e.g. for CompressedNonNull<T>
there would still be a "pointer" as part of the triplet, but this pointer is a ZST in some cases).
I also thought this was a settled issue, but looking through the history seems to have proven me wrong. https://github.com/rust-lang/rust/issues/67024 and https://github.com/rust-lang/rust/issues/45431 are the only two past discussions I can find which cite the "always a triplet" clause as evidence (https://github.com/rust-lang/rust/issues/31554 also quotes it, but it's not directly relevant there). In the former @KrishnaSannasi interpreted it as clearly ruling out implementations which store the capacity on the heap (which seems obviously correct to me, but also very separate from this isssue). In the latter, @gnzlbg raised exactly the same ZST question being discussed here (two years ago!), and explicitly proposed that we reword it to say "where T is not zero-sized" and "When T is zero-sized the layout of Vec
Considering how well-specified a non-ZST Vec's layout is, I do think we should specify (or at least explicitly declare unspecified) a ZST Vec's layout as well. I have no strong opinion on what it should be, since it does seem likely that any code depending on the layout is going to have to think about ZSTs explicitly regardless of what Vec does.
@gnzlbg every time you ask me where Vec guarantees things I point you to the exact same spot XD
There is a section in their docs that's literally called "guarantees", https://doc.rust-lang.org/alloc/vec/struct.Vec.html#guarantees, and it clearly states the triplet thing. It could not be more clear, and it's been like that for as long as I ever remember looking at that page, like 3+ years
Edit: and to be clear I don't see how ZST affects it, you'd just have a pointer for no reason. I don't see how Option Vec T affects it, because Option is a totally different type.
I don't see how ZST affects it, you'd just have a pointer for no reason. I don't see how Option Vec T affects it, because Option is a totally different type.
@Lokathor note that size_of::<Option<Vec<T>>>() == size_of::<Vec<T>>()
if we remove the pointer, this will no longer be true. (this because of NonNull
inside of RawVec
)
Yes, that is also true
Worth noting that code in the wild (the bitvec
crate) relies on being able to pack 2-usizes worth of data into a &[()]
, which it uses as a way to work around the lack of custom DSTs.
@thomcc that pattern is most likely not backed by the layout guarantees that we are making so far... the only legal way to create a &[()]
from its parts is from_raw_parts
(satisfying all its documented safety requirements). A transmute
is incorrect.
I didn't mean to imply it was transmuted. They use from_raw_parts: https://github.com/myrrlyn/bitvec/blob/master/src/pointer.rs#L776.
If you read the comments in that file it explains in detail what's going on, but basically they're passing a tagged pointer with some special data for the pointer arg of from_raw_parts, and encoding (length_in_bits, bit_offset) into a usize, and passing that in for the length arg of from_raw_parts. (This is in order to emulate a bit-addressed slice)
And to be clear I have no horse in this race -- It surprised me that something that dodgy would be in such a widely used crate, and feels like the kind of thing that's really asking for trouble.
That said, I do have a pretty strong opinion against the optimization @gnzlbg suggested here. I also worry that something like this would cause bugs in unsafe code, but can be more concrete. My immediate concern boils down to the fact that deref coersions can turn an array into a slice.
Specifically, consider a case like:
#[repr(C, align(16)]
struct Align16<T>(pub T);
#[repr(C)]
struct Header {
size: usize,
next: *mut Header,
// etc...
data_start: Align16<[(); 0]>
}
This is not how I'd do it now, but I'm pretty sure I've done something very close to this when writing a malloc
relatively early on in my rust career. At first blush, *mut ()
looks like the rust equivalent of void*
and so you might use a signature like this: fn malloc(x: usize) -> *mut ()
, especially if you were planning on exposing it to C.
Anyway, if you represented chunks this way, and were returning a *mut ()
from your malloc, you might finish it function by returning something like header.data.0.as_mut_ptr()
to the caller, which would be busted by this optimization in a pretty surprising way: as_mut_ptr
is a method on [T]
, which means the suggested optimization would have kicked in when the [(); 0]
was converted to a slice, and the real address would be thrown away. Bad news.
... Anyway, this is just an immediate thought. I imagine it would cause other problems too (maybe for ZSTs which are used to represent opaque C data types).
I think that using references to ZST like that should in theory cause problems like #134. I'm guessing the reason doesn't in practice is because Miri doesn't track raw pointers and the data is only ever accessed by raw pointers.
Right now, we specify here that:
However, if
T
is a ZST, we actually only need to store the length of the slice, and can make&[T]
one word wide.When coercing
&[T]
to a*const T
, we can materialize the pointer out of thin air using:std::mem::align_of::<T>() as *const T
.The current specification would prevent such a layout optimization.
Is this an optimization worth supporting?
EDIT: some issues
(ptr, len) != [T]::from_raw_parts(ptr, len).into_raw()
, sinceptr
does not necessarily need to bealign_of::<T>()
, yet if we don't store it, the address is lost