Closed Shnatsel closed 5 years ago
It's possible that std::mem::forget() must be called before the new Vec is constructed, or that it's only safe to do with ManuallyDrop.
Yes, the potential issue is that
Vec::from_raw_parts(v_orig.as_mut_ptr() as *mut Option<&i32>,
v_orig.len(),
v_orig.capacity())
asserts / requires that v_org.as_mut_ptr()
not be aliased (due to Vec::from_raw_parts
creating (unique) ownership out of that pointer)
However, the original pointer it comes from, the one from v_orig
, is "used" right afterwards in mem::forget
. Unless forget
becomes a special lang construct, the current stacked borrows model fails with this pattern.
The solution is to "forget
" in advance, thanks to ManuallyDrop
:
let store = [0, 1, 2, 3];
let v_orig = store.iter().collect::<Vec<&i32>>();
// Using from_raw_parts
let v_from_raw = unsafe {
// prepare for an auto-forget of the initial vec:
let v_orig: &mut Vec<_> = &mut *ManuallyDrop::new(v_orig);
Vec::from_raw_parts(v_orig.as_mut_ptr() as *mut Option<&i32>,
v_orig.len(),
v_orig.capacity())
// v_orig is never used again, so no aliasing issue
};
That's a lot of code. Perhaps it might be a good idea to encapsulate it in a Vec::transmute::<Item>()
-method?
There is currently no trait for this in core or std, so crates have to make up their own.
@llogiq sounds like a good idea to me. Care to get the ball rolling with an issue against stdlib and/or an RFC?
This actually could potentially be a lot more broad and useful, though I don't know how much of this info that clippy really has access to when doing an analysis:
repr(Rust)
type with more than one field is UB.Related:
Box<H>
to Box<K>
is UB when you drop the new Box unless the new type's size and alignment are both exactly the same. This includes anything that transitively contains a Box@Shnatsel Sure thing. I'll write up that RFC.
As a follow-up to @Lokathor: currently, a lint would be always correct to trigger when both type arguments to transmute
are repr(Rust)
, unless they are fully equivalent. Any defined transmutes must involve at least one repr(C)
or repr(transparent)
type (counting built-in types with defined layouts as repr(C)
).
A lint specifically for Vec
and/or other smart pointers is a great opportunity for teaching, but until we get more defined layouts, it's UB for any two distinct repr(Rust)
types even when they're identical newtypes.
a lint would be always correct to trigger when both type arguments to transmute are repr(Rust)
I think that's basically https://github.com/rust-lang/rust/issues/50842.
I'm not 100% convinced that the "correct" code proposed in documentation correctly upholds aliasing invariants. It's possible that std::mem::forget() must be called before the new Vec is constructed, or that it's only safe to do with ManuallyDrop.
@llogiq has since corrected that in the docs.
Oops. Wrong repository (too many tabs) and answered here.
Documentation on std::mem::transmute() specifically calls out that transmuting a Vec of any type is UB.
Coming in late, but is this really true for every transmute, even ones between repr(transparent)
newtype wrappers? For example:
#[derive(Debug)]
struct Inner(i32);
#[derive(Debug)]
#[repr(transparent)]
struct Outer(Inner);
fn main() {
let v: Vec<Outer> = [1, 2, 3, 4, 5].iter().map(|&i| Outer(Inner(i))).collect();
println!("{:?}", v);
let v: Vec<Inner> = unsafe { std::mem::transmute(v) };
println!("{:?}", v);
}
Correct.
Because Vec
is not transparent, so it's UB.
If needed, you could convert a slice of Outer to slice of Inner via core::slice::from_raw_parts and pointer casting and all that, but not with Vec.
While I understand why Vec<&T>
-> Vec<Option<&T>>
is unsafe and UB, it's unclear why this lint applies to any vector transformations.
For example, the most common usecase for transmute for me is when a newtype wraps another one with #[repr(transparent)
:
#[repr(transparent)]
struct NewType<T>(T);
Isn't in this case transmuting Vec<NewType<T>>
<=> Vec<T>
safe because their alignments, pointer representations and everything else are guaranteed to match?
No, because the implementation of Vec
itself is non-transmutable because it's #[repr(Rust)]
.
Pretend that Vec<T>
is effectively (*mut T, len, cap)
. There's nothing that prevents Vec<U>
from being (cap, *mut U, len)
, or any other permutation.
Said another way:
Because
Vec
is not transparent, so it's UB.
See also https://github.com/rust-lang/rust-clippy/issues/4484#issuecomment-544688894
Nope.
Because Vec itself is repr(Rust), all transmutation between different monomorphizations is UB.
Now, sure, the data layout is likely to be the same and so it is likely to work despite being UB, but that's how a lot of UB is until one day it breaks.
sniped :(
@shepmaster Hmm. Is this something specific to Vec
because it's built-in, or to any repr(Rust)
?
Generally I'd expect #[repr(transparent)]
to guarantee that when it's put inside arbitrary structure or an enum, the layout of that structure or enum would be exactly the same as if the type was unwrapped.
If that's not the case, that significantly reduces usefulness of #[repr(transparent)]
for FFI code, since it means only direct usage of the type is guaranteed to be transparent.
UPD: Oh I see, it's worse than I thought - even structs without newtypes are not guaranteed to be deterministic :/ https://rust-lang.github.io/unsafe-code-guidelines/layout/structs-and-tuples.html#unresolved-question-guaranteeing-compatible-layouts
That's pretty bad... I don't want to switch all my types to repr(C)
as repr(Rust)
provides lots of valuable size optimisations, yet I do want to use #[repr(transparent)]
nested inside other types without worrying about layout incompatibilities...
provides lots of valuable size optimisations
These optimizations are possible only because the layout is not guaranteed. You have the choice between optimized layout and predictable layout. Having both at the same time is just impossible.
These optimizations are possible only because the layout is not guaranteed.
Sure, but there is (usually) a difference between "layout is not guaranteed" as in "it's unspecified, so don't rely on exact representation" and "layout is not deterministic" as in "even structurally equivalent types are not guaranteed to have compatible layout".
Most of these optimisations require only the former, and as a user I'm perfectly fine with not relying on exact memory representation, but I'd still like to rely on compatibility between types, as that's pretty much the only way to write efficient code in many cases.
It could potentially be deterministic without being fixed for all time.
However, there's a lot of work to do there, so it's UB until that happens, if that happens at all.
People are experimenting with safe transmutation methods as we speak. In general, I believe a type should be able to state what types it can be transmuted into and how (though to do this ergonomically might require us to extend the type system, e.g. to have #[repr(transparent)]
auto-implement some marker trait).
Indeed! Beyond the safe-transmute project there's also various crates that are feeding ideas into the project that are already all capable of being used.
Documentation on std::mem::transmute() specifically calls out that transmuting a Vec of any type is UB.
Code with UB from documentation:
Correct code from documentation (assuming interior types can be transmuted):
What exactly is wrong with it is not currently documented (I've opened https://github.com/rust-lang/rust/issues/64073 to clarify this) but my understanding is that the problem stems from the following:
I.e. while it's safe to transmute the data on the heap assuming the types are compatible, it's never safe to transmute the (pointer, capacity, length) triplet on the stack.
I'm not 100% convinced that the "correct" code proposed in documentation correctly upholds aliasing invariants. It's possible that
std::mem::forget()
must be called before the new Vec is constructed, or that it's only safe to do withManuallyDrop
.cc @RalfJung