Open Amanieu opened 8 years ago
There's some discussion about this in https://github.com/rust-lang/rust/issues/28951
I like how Swift has implemented various features we've only discussed. This is valuable, as it can attest to the practicality of having such features, without time-consuming investigation on our part. Thanks @Amanieu for bringing it up!
Neither of the following must become UB due to changes made to object layout.
fn dummy(p: &T) {
unsafe {
// getelementptr inbounds
p.offset(1);
}
}
unsafe fn get_second(p: *const T) -> T {
ptr::read(p.offset(1))
}
@mahkoh .offset
already does C-like array indexing, i.e. p.offset(1) == (p as *const T).offset(aligned_size_of::<T>()) as *const T
, whether or not size_of
remains aligned_size_of
(which it probably will, from the discussion I've seen so far).
I think this is easily solved by requiring memory allocation to use the aligned size of an object instead of the packed size. That way p.offset(1)
, which uses the aligned size, will still be only 1 byte past the end of a valid object, which avoid undefined behavior.
@eddyb: Then the object cannot be smaller than the aligned_size_of because otherwise the first function is UB.
@Amanieu: The first function can take a reference to T
that lives on the stack, the heap, is a field in a struct, etc. Every instance of T
that can be passed to the first function must have the aligned_size_of. Otherwise the behavior of offset is undefined.
@eddyb: Ah, I see what you meant. I'll update the code above.
And to bring size_of
back into the picture:
fn dummy2(p: &T) {
unsafe {
// getelementptr inbounds
(p as *const _ as *const u8).offset(mem::size_of::<T>());
}
}
must not be UB either.
@mahkoh: This isn't UB because, from my understanding, getelementptr
only requires the target to be within the same "allocated object". This means that it is OK for the resulting pointer to end up in the middle of another object, as long as that other object is part of the same allocation as the original object. This means that it should be fine as long as rust invokes the LLVM alloca
instruction with the aligned size instead of the packed size when allocating objects on the stack.
Still, this optimization seems to be impossible due to the issue mentioned in the OP:
In particular, unsafe code might assume that, when given a pointer let x: *mut T, all bytes from x to x + mem::size_of::
() can safely be overwritten.
The user is actually guaranteed this property for Copy
types by the documentation:
Types that can be copied by simply copying bits (i.e. memcpy).
While mem::size_of
is not explicitly mentioned, this seems to be unambiguous. If the memcpy comment didn't exist, one might argue otherwise, but memcpy
always requires an explicit size argument in bytes.
Ah, I had overlooked this issue, but this was one of the things I was thinking about while writing the Allocator RFC (#1398). I believe the Layout
API presented there is forward-compatible with separate notions of stride/size, but it would be best to double-check that claim carefully.
(Update: in fact I had already commented (https://github.com/rust-lang/rust/issues/28951#issuecomment-147369442) about having thoughts related to this topic, though that comment is quite wishy-washy about what direction we should go in...)
(nominating because I think the lang team should be aware of the question here.)
It looks like even arrays [T; N]
/[T]
can use the layout with omitted trailing padding (they are not included in the Swift document).
It doesn't create new problems in addition to the memcpy(sizeof)
/memset(sizeof)
problem already existing for non-array types.
Making arrays to follow the same rule has benefits of making [T; 3]
binary compatible with (T, T, T)
, and [T; 1]
binary compatible with T
.
The first step towards supporting this would be to add mem::inner_size_of
and mem::inner_size_of_val
(names subject to the usual bikeshedding). These would initially return the same value as mem::size_of
, but it would allow code to start preparing for the introduction of separate size and stride.
Funnily enough, it seems this would actually break the code I have written for atomic-rs.
This function might overwrite an adjacent value if it is embedded in the padding of the atomic object (which is just an UnsafeCell<T>
).
The fix would be to use inner_size_of
instead of size_of
, however this shows that this feature can still be a breaking change for some code.
As pointed out in #1582, this would allow us to change the layout of tuples so that a tuple can be destructured into a single element and the rest of the tuple while still retaining a fairly space-efficient layout. That would, in turn, make it easier to support variadic generics.
My feeling is that inhibiting writing a 3-byte struct like (i16, bool)
as a single 4 byte write everywhere (i.e. whether or not it is actually nested like ((i16, bool), bool)
) is a larger downside than compressing any structs that actually look like that.
However, I don't have a great feeling for the number of actual odd-size structs, nor the number of nested ones, nor how often they hit memory. I guess one possibility for collecting data would be instrumenting the compiler to report/record this sort of info during compilation and run it on some programs (e.g. the bootstrap, servo, some things from the rest of the ecosystem) to get an idea (getting an idea for number of writes would require runtime instrumentation, I suppose). Theoretically it might good to also know about this in the precise of layout optimisations that reorder fields (e.g. (bool, i16)
gets represented in memory as (i16, bool)
) since those will likely maximise opportunities for stride != size to be beneficial (similarly, enum stuff like Option<i16>
== (i16, bool)
).
I just noticed a slight flaw with this scheme regarding DSTs. If the last element in a struct is even potentially unsized then it must remain the last element in the struct.
For example consider struct Mutex<T: ?Sized>(StaticMutex, UnsafeCell<T>)
. The UnsafeCell
must be the last element in the struct, even if the struct is later monomorphized, since &Mutex<i32>
can be coerced into &Mutex<Debug>
for example.
This means that making a type support unsized type parameters with ?Sized
can negatively impact its layout, even if in practice the vast majority of uses will be with sized types.
@Amanieu can't the layout vary based on the choice of types that the struct is instantiated with, and thus the constraints imposed on instantiations with an unsized type won't affect instantiations with all sized types?
@pnkfelix No. Look at the example I gave in my comment:
For example consider
struct Mutex<T: ?Sized>(StaticMutex, UnsafeCell<T>)
. The UnsafeCell must be the last element in the struct, even if the struct is later monomorphized, since&Mutex<i32>
can be coerced into&Mutex<Debug>
for example.
Because &Mutex<i32>
can be coerced into &Mutex<Debug>
, both of these types must have the same layout, which in this case means that the last field in the struct must always be the potentially unsized field.
@Amanieu oh, sorry for not reading carefully enough
This is basically https://github.com/rust-lang/rust/issues/17027
This would also help C++ interop.
In C++20, C++ added the [[no_unique_address]]
attribute. This attribute specifies that a field's tail padding can be reused by an adjacent object, similar to the #[no_trailing_padding]
attribute proposed in https://github.com/rust-lang/rust/issues/17027, but specified on the field, not the type. This is bad news for Rust/C++ interop, as Rust currently assumes, when given a &mut
reference, that it can write to the whole object, including its tail padding. (It does not always do this, but it permits it -- see e.g. ptr::write_unaligned
.) If [[no_unique_address]]
is used pervasively in a C++ codebase (for instance, inside the STL), this means that in theory every C++ type with usable tail padding can only safely be shared with Rust behind a Pin<&mut T>
or equivalent, and cannot be shared via &mut T
even if it is otherwise compatible with Rust semantics (e.g. Clang's __is_trivially_relocatable
returns true). (And if you do use Pin
for this, then it can't be used by value, due to the existence of Pin::set
.)
(In practice, this is not actually every type. Some types have no tail padding – for instance, int
, std::string
, or std::string_view
. And in some ABIs, some types have tail padding, but that tail padding can't be reused by via [[no_unique_address]]
. (In the Itanium ABI, this is all "POD" types, for a particular definition of "POD" -- this is done in part as a compromise to address the issue brought up in https://github.com/rust-lang/rfcs/issues/1397#issuecomment-213311508 above.) But most classes that users define would only be safe to use through a &mut
by coincidence, not by design.)
If we separated out size
and stride
, and specified that writes (via assignment, mem::swap
, ptr::write
, ptr::write_unaligned
, etc.) only wrote out size
number of bytes (which can be smaller than stride
), then we could still use these types from Rust, as if they were any old Rust type -- without the use of Pin
etc. Also, by doing this, we would make it possible to sensibly add some equivalent to [[no_unique_address]]
to Rust, as proposed in https://github.com/rust-lang/rust/issues/17027 -- allowing not just for more optimizations, but also allowing for matching the layout of a C++ type which uses [[no_unique_address]]
.)
Because this seems likely to be important for C++ interop, I'd be interested in taking this on / pushing this forward, but I don't really know what the next steps are. There is this issue in the RFCs repo, but there is not an RFC markdown file. Is the next step to write up a concrete proposal + rationale as a markdown file, following the RFC process?
Note that this would have to be done in a way that is backwards-compatible, and it should not be a pessimization. Both of make this very difficult.
Concretely we do need to keep that size_of::<T>()
is the stride, since far too much code has been written that will break if it's less. So, we could add a new value occupied_bytes_of::<T>()
(e.g. what you're calling the size -- the number of bytes that we write to on assignments), but by default this needs to be the size, otherwise it seems very likely to be a pessimization. I don't think this is a deal-breaker for C++ interop, but I think it would end up looking a lot like the #[no_trailing_padding]
attribute.
In terms of your question, yes, the next steps would be either:
The Swift ABI defines an interesting struct layout which defines the size of a type separately from its stride:
While rust doesn't define a stable ABI yet, I think using such a layout would be a breaking change due to assumptions that unsafe code make about
mem::size_of
.In particular, unsafe code might assume that, when given a pointer
let x: *mut T
, all bytes fromx
tox + mem::size_of::<T>()
can safely be overwritten. This would be incorrect when the padding bytes at the end ofT
contain fields of an outer struct.Is such an optimization worth considering? From a quick look at the rust source, it seems that all uses of
mem::size_of
will continue to work fine if we define it as returning the stride (aligned size). I haven't looked at any external crates yet but I think they would continue to work fine as well. In any case this would be a breaking change and should be considered carefully.