rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
96.76k stars 12.5k forks source link

null trait object raw pointers are UB #63851

Open nico-abram opened 5 years ago

nico-abram commented 5 years ago

Opening after some discussion in https://github.com/rust-lang/miri/issues/918#issuecomment-524554552

This playground snippet crashes on an illegal instruction on release: https://play.rust-lang.org/?version=stable&mode=release&edition=2018&gist=7c69493026add62256996d204e1278c0 Are vtable ptrs in trait object ptrs NonNull? Is that intended? This (2 year old) comment would suggest so https://github.com/rust-lang/rfcs/issues/433#issuecomment-345497470

CC @RalfJung

bjorn3 commented 5 years ago
trait T { }

fn main() {
    a(unsafe { std::mem::zeroed() });
}

fn a(_: *mut dyn T) {}
; ...
    call void @_ZN10playground1a17hbc18b5e956a8e54dE({}* %1, [3 x i64]* noalias readonly align 8 dereferenceable(24) %2), !dbg !145
; ...

The vtable is marked as dereferencable(24).

RalfJung commented 5 years ago

Thanks for reporting!

In https://github.com/rust-lang/unsafe-code-guidelines/issues/166 we have been discussing, rather theoretically, the validity requirements for wide raw pointers. We were not actually aware that there already are validity requirements in place for them. That seems very surprising to me. Was this ever discussed in depth, or did it "just happen"?

Cc @rust-lang/wg-unsafe-code-guidelines @eddyb

RalfJung commented 5 years ago

@bjorn3 on top of that, *const dyn Send has size 16 -- so, we do enum niche optimizations.

RalfJung commented 5 years ago

If we decide this is not a bug, we'd have to adjust https://doc.rust-lang.org/nightly/nomicon/what-unsafe-does.html (and also the reference). It would be a very special case to require anything for raw pointers that are not dereferenced.

eddyb commented 5 years ago

The assumption is that the vtable is a valid reference, so that all raw pointers have valid pointee types.

*mut dyn Trait corresponds to (pseudo-notation) dyn T: Trait.*mut T or, further expanded, exists T.(*mut T, &<T as Trait>::vtable).

IMO, if the information associated with that T is invalid, or doesn't require a T to exist in the first place, that complicates the typesystem (or rather, the supposed typesystem that may have never been fully formally modeled - I've been meaning to do a writeup on dyn for a while now).

In custom DST terms, we could treat *mut T as (*mut T::Data, MaybeUninit<T::Meta>) (whereas &T would always have an initialized T::Meta), so overall maybe the cost of supporting this difference between raw pointers and references might not be that large.

There's also something to be said about having a null vtable, that Go people have some experience with (but I don't, so I'm not the best person to comment on it).

RalfJung commented 5 years ago

The assumption is that the vtable is a valid reference, so that all raw pointers have valid pointee types.

The curious part here is that no one involved in the UCG realized that this is the current status.

IMO, if the information associated with that T is invalid, or doesn't require a T to exist in the first place, that complicates the typesystem

Where is there a complication here? I mean not in terms of the implementation, that shouldn't be a primary concern in terms of language design, but in terms of documentation and teaching.

Requiring vtables of raw pointers to be valid complicates the language as well, it would be an exception from the principle to not require things off of raw pointers.

In custom DST terms, we could treat mut T as (mut T::Data, MaybeUninit)

I was more thinking of *mut dyn Trait as (usize, usize). But your proposal scales better to more generalized metadata, indeed. I like it.

eddyb commented 5 years ago

I don't even think I'm the first to come up with that, but I can't remember who did.

In the end, I guess it's a tradeoff between:

(these are not the only outcomes, but they should be somewhat representative)

strega-nil commented 5 years ago

If one reads the custom DST proposal, there's a reason that we expect raw pointers to DSTs to have valid metadata. The rawness of the pointer is unrelated to the rawness of the metadata, imo -- even if you have *mut [T], you expect the length to be a usize.

comex commented 5 years ago
  • safe size_of_val for *mut T, even for T: !Sized

There's also the ability for methods taking self: *const Self to be object safe. For now, such methods can be written and are treated as object safe, but the functionality is guarded by the arbitrary_self_types feature. The tracking issue for that feature has some discussion of whether raw pointer methods should be object safe.

(It claims that "even today you could UB in safe code with mem::size_of_val(foo)", where foo: *const dyn Foo, but I don't think that's true? size_of_val doesn't currently work on raw pointers.)

scottmcm commented 5 years ago

It seems like this is at least somewhat expected, because if all pointers could be zeroed then we wouldn't have had so many conversations about how to make ptr::null() work with extern type, and would have just changed its implementation to unsafe { mem::zeroed() }.

Also, do any of the choices here have any impacts to the optimizations to never reload vtables when multiple calls get made?

hanna-kruppe commented 5 years ago

FWIW, ptr::null being restricted to Sized types is not just because of trait object vtables, there's also concerns about interaction with custom DSTs whose metadata may be a fair bit more complicated than just an usize or a pointer.

RalfJung commented 5 years ago

safe size_of_val for *mut T, even for T: !Sized

That won't hold up with custom DST anyway, e.g. if T is a "thin" trait object. So, if we want to be future-compatible with custom DST, I think requiring the metadata to be valid for raw pointers buys us pretty much nothing?

There's also the ability for methods taking self: *const Self to be object safe.

Isn't the issue there just that calling methods with a "raw receiver type" will be unsafe?

eddyb commented 5 years ago

there's also concerns about interaction with custom DSTs whose metadata may be a fair bit more complicated than just an usize or a pointer.

But if we go for MaybeUninit<T::Meta>, then that's uniform across all types.

That won't hold up with custom DST anyway, e.g. if T is a "thin" trait object.

Yeah it would be nice to not tie anything to the T::Meta but always require a reference, since you wouldn't need several traits, just something like DynSized. I wonder what we lose in that case.

Isn't the issue there just that calling methods with a "raw receiver type" will be unsafe?

So "it's UB to call with invalid Self::Meta"? Sounds reasonable.


Overall I feel like we should look into this a bit more, but I can see how the "low-level compromise" lets us focus on &T and not worry about raw pointers (e.g. regarding custom DSTs).

It's almost like raw pointers are union { &UnsafeCell<T>, usize }, huh.

RalfJung commented 5 years ago

Turns out that for raw slices, safe code can cause them to have invalid metadata:

123456789 as *const [(); usize::max_value()] as *const [()] as *const [u64]

Ouch.

eddyb commented 5 years ago

@RalfJung Wow, so we couldn't actually have <[T]>::Meta = (T::Meta, usize), it would have to be something opaque that depends on the size of T (which in the future might get very complicated with T: !Sized slice elements).

RalfJung commented 5 years ago

Looks like probably it should be just usize, and the max-size comes in via the requirement that references (including slices) must point to memory that is allocated in a single allocation -- which for LLVM means it cannot be bigger than isize::MAX.

So, I was just interpreting that length requirement the wrong way so far. I'll adjust the various PRs (Reference, Nomicon, Miri) accordingly.

strega-nil commented 5 years ago

@RalfJung The issue is not invalid metadata -- invalid metadata is fine. The big thing that you want is initialized metadata.

*const UnsizedType should contain <UnsizedType>::Meta no matter what, but there's no guarantee on the validity of that metadata for the pointed to object.

RalfJung commented 5 years ago

"initialized" as a term turned out to be pretty unsuited. I mean "valid" specifically as in "validity invariant".

strega-nil commented 5 years ago

Sure, whatever. It needs to be a valid value of type <T>::Meta

eddyb commented 5 years ago

@ubsan You can circumvent that by defining raw pointers as having MaybeUninit<T::Meta> instead of T::Meta directly.

strega-nil commented 5 years ago

@eddyb you can, but I don't think there's a reasonable argument for it. Creating and destructuring raw pointers safely seems really nice to me.

RalfJung commented 5 years ago

"really nice" is IMO not a sufficient argument for introducing this kind of UB. We should have some optimization that's worth it, ir some amount of spec simplification that's worth it.

The "default" answer should be "let's avoid UB, to make unsafe code author's lives easier".

strega-nil commented 5 years ago

@RalfJung how does one introduce UB in this case? Are people generally returning mem::zeroed::<*mut Trait>()?

I would argue that this is removing UB that people get when calling methods through a *mut Trait

RalfJung commented 5 years ago

how does one introduce UB in this case? Are people generally returning mem::zeroed::<*mut Trait>()?

That would be an example of an operation that becomes UB under your semantics. Doesn't seem too strange to me to zero-initialize a struct with a wide raw pointer field.

I would argue that this is removing UB that people get when calling methods through a *mut Trait

There is literally no program that is UB under my semantics but not under yours.

This is not about the safety invariant, it is about the validity invariant.

strega-nil commented 5 years ago

@RalfJung shrug

I don't have a horse in this race. I just think they're nicer semantics. If you think it's more important to allow mem::zeroed than access to the metadata being safe, that's your prerogative.