rust-lang / rust-clippy

A bunch of lints to catch common mistakes and improve your Rust code. Book: https://doc.rust-lang.org/clippy/
https://rust-lang.github.io/rust-clippy/
Other
11.45k stars 1.54k forks source link

New lint: transmute(Vec) is undefined behavior #4484

Closed Shnatsel closed 5 years ago

Shnatsel commented 5 years ago

Documentation on std::mem::transmute() specifically calls out that transmuting a Vec of any type is UB.

Code with UB from documentation:

let store = [0, 1, 2, 3];
let mut v_orig = store.iter().collect::<Vec<&i32>>();

// Using transmute: this is Undefined Behavior, and a bad idea.
let v_transmuted = unsafe {
    std::mem::transmute::<Vec<&i32>, Vec<Option<&i32>>>(v_orig)
};

Correct code from documentation (assuming interior types can be transmuted):

let store = [0, 1, 2, 3];
let mut v_orig = store.iter().collect::<Vec<&i32>>();
// Using from_raw_parts
let v_from_raw = unsafe {
    Vec::from_raw_parts(v_orig.as_mut_ptr() as *mut Option<&i32>,
                        v_orig.len(),
                        v_orig.capacity())
};
std::mem::forget(v_orig);

What exactly is wrong with it is not currently documented (I've opened https://github.com/rust-lang/rust/issues/64073 to clarify this) but my understanding is that the problem stems from the following:

Most fundamentally, Vec is and always will be a (pointer, capacity, length) triplet. No more, no less. The order of these fields is completely unspecified, and you should use the appropriate methods to modify these.

I.e. while it's safe to transmute the data on the heap assuming the types are compatible, it's never safe to transmute the (pointer, capacity, length) triplet on the stack.

I'm not 100% convinced that the "correct" code proposed in documentation correctly upholds aliasing invariants. It's possible that std::mem::forget() must be called before the new Vec is constructed, or that it's only safe to do with ManuallyDrop.

cc @RalfJung

danielhenrymantilla commented 5 years ago

It's possible that std::mem::forget() must be called before the new Vec is constructed, or that it's only safe to do with ManuallyDrop.

Yes, the potential issue is that

    Vec::from_raw_parts(v_orig.as_mut_ptr() as *mut Option<&i32>,
                        v_orig.len(),
                        v_orig.capacity())

asserts / requires that v_org.as_mut_ptr() not be aliased (due to Vec::from_raw_parts creating (unique) ownership out of that pointer)

However, the original pointer it comes from, the one from v_orig, is "used" right afterwards in mem::forget. Unless forget becomes a special lang construct, the current stacked borrows model fails with this pattern.

The solution is to "forget" in advance, thanks to ManuallyDrop:

let store = [0, 1, 2, 3];
let v_orig = store.iter().collect::<Vec<&i32>>();
// Using from_raw_parts
let v_from_raw = unsafe {
    // prepare for an auto-forget of the initial vec:
    let v_orig: &mut Vec<_> = &mut *ManuallyDrop::new(v_orig);
    Vec::from_raw_parts(v_orig.as_mut_ptr() as *mut Option<&i32>,
                        v_orig.len(),
                        v_orig.capacity())
    // v_orig is never used again, so no aliasing issue
};
llogiq commented 5 years ago

That's a lot of code. Perhaps it might be a good idea to encapsulate it in a Vec::transmute::<Item>()-method?

Lokathor commented 5 years ago

There is currently no trait for this in core or std, so crates have to make up their own.

Shnatsel commented 5 years ago

@llogiq sounds like a good idea to me. Care to get the ball rolling with an issue against stdlib and/or an RFC?

Lokathor commented 5 years ago

This actually could potentially be a lot more broad and useful, though I don't know how much of this info that clippy really has access to when doing an analysis:

Related:

llogiq commented 5 years ago

@Shnatsel Sure thing. I'll write up that RFC.

llogiq commented 5 years ago

https://github.com/rust-lang/rfcs/pull/2756

CAD97 commented 5 years ago

As a follow-up to @Lokathor: currently, a lint would be always correct to trigger when both type arguments to transmute are repr(Rust), unless they are fully equivalent. Any defined transmutes must involve at least one repr(C) or repr(transparent) type (counting built-in types with defined layouts as repr(C)).

A lint specifically for Vec and/or other smart pointers is a great opportunity for teaching, but until we get more defined layouts, it's UB for any two distinct repr(Rust) types even when they're identical newtypes.

RalfJung commented 5 years ago

a lint would be always correct to trigger when both type arguments to transmute are repr(Rust)

I think that's basically https://github.com/rust-lang/rust/issues/50842.

I'm not 100% convinced that the "correct" code proposed in documentation correctly upholds aliasing invariants. It's possible that std::mem::forget() must be called before the new Vec is constructed, or that it's only safe to do with ManuallyDrop.

@llogiq has since corrected that in the docs.

llogiq commented 5 years ago

4592 should have closed this.

shepmaster commented 5 years ago

Oops. Wrong repository (too many tabs) and answered here.


Documentation on std::mem::transmute() specifically calls out that transmuting a Vec of any type is UB.

Coming in late, but is this really true for every transmute, even ones between repr(transparent) newtype wrappers? For example:

#[derive(Debug)]
struct Inner(i32);

#[derive(Debug)]
#[repr(transparent)]
struct Outer(Inner);

fn main() {
    let v: Vec<Outer> = [1, 2, 3, 4, 5].iter().map(|&i| Outer(Inner(i))).collect();
    println!("{:?}", v);

    let v: Vec<Inner> = unsafe { std::mem::transmute(v) };
    println!("{:?}", v);
}
Lokathor commented 5 years ago

Correct.

Because Vec is not transparent, so it's UB.

If needed, you could convert a slice of Outer to slice of Inner via core::slice::from_raw_parts and pointer casting and all that, but not with Vec.

RReverser commented 4 years ago

While I understand why Vec<&T> -> Vec<Option<&T>> is unsafe and UB, it's unclear why this lint applies to any vector transformations.

For example, the most common usecase for transmute for me is when a newtype wraps another one with #[repr(transparent):

#[repr(transparent)]
struct NewType<T>(T);

Isn't in this case transmuting Vec<NewType<T>> <=> Vec<T> safe because their alignments, pointer representations and everything else are guaranteed to match?

shepmaster commented 4 years ago

No, because the implementation of Vec itself is non-transmutable because it's #[repr(Rust)].

Pretend that Vec<T> is effectively (*mut T, len, cap). There's nothing that prevents Vec<U> from being (cap, *mut U, len), or any other permutation.

Said another way:

Because Vec is not transparent, so it's UB.

See also https://github.com/rust-lang/rust-clippy/issues/4484#issuecomment-544688894

Lokathor commented 4 years ago

Nope.

Because Vec itself is repr(Rust), all transmutation between different monomorphizations is UB.

Now, sure, the data layout is likely to be the same and so it is likely to work despite being UB, but that's how a lot of UB is until one day it breaks.

Lokathor commented 4 years ago

sniped :(

RReverser commented 4 years ago

@shepmaster Hmm. Is this something specific to Vec because it's built-in, or to any repr(Rust)?

Generally I'd expect #[repr(transparent)] to guarantee that when it's put inside arbitrary structure or an enum, the layout of that structure or enum would be exactly the same as if the type was unwrapped.

If that's not the case, that significantly reduces usefulness of #[repr(transparent)] for FFI code, since it means only direct usage of the type is guaranteed to be transparent.

RReverser commented 4 years ago

UPD: Oh I see, it's worse than I thought - even structs without newtypes are not guaranteed to be deterministic :/ https://rust-lang.github.io/unsafe-code-guidelines/layout/structs-and-tuples.html#unresolved-question-guaranteeing-compatible-layouts

That's pretty bad... I don't want to switch all my types to repr(C) as repr(Rust) provides lots of valuable size optimisations, yet I do want to use #[repr(transparent)] nested inside other types without worrying about layout incompatibilities...

RalfJung commented 4 years ago

provides lots of valuable size optimisations

These optimizations are possible only because the layout is not guaranteed. You have the choice between optimized layout and predictable layout. Having both at the same time is just impossible.

RReverser commented 4 years ago

These optimizations are possible only because the layout is not guaranteed.

Sure, but there is (usually) a difference between "layout is not guaranteed" as in "it's unspecified, so don't rely on exact representation" and "layout is not deterministic" as in "even structurally equivalent types are not guaranteed to have compatible layout".

Most of these optimisations require only the former, and as a user I'm perfectly fine with not relying on exact memory representation, but I'd still like to rely on compatibility between types, as that's pretty much the only way to write efficient code in many cases.

Lokathor commented 4 years ago

It could potentially be deterministic without being fixed for all time.

However, there's a lot of work to do there, so it's UB until that happens, if that happens at all.

llogiq commented 4 years ago

People are experimenting with safe transmutation methods as we speak. In general, I believe a type should be able to state what types it can be transmuted into and how (though to do this ergonomically might require us to extend the type system, e.g. to have #[repr(transparent)] auto-implement some marker trait).

Lokathor commented 4 years ago

Indeed! Beyond the safe-transmute project there's also various crates that are feeding ideas into the project that are already all capable of being used.