Open nikomatsakis opened 2 years ago
I am nominating this for @rust-lang/lang decision and I am also cc'ing @rust-lang/wg-unsafe-code-guidelines.
This decision is the sole remaining blocker for stabilizing dyn upcasts.
I would probably move to merge now, but I would particularly like some quick feedback on @RalfJung's concern here:
RalfJung raised the concern that a validity invariant of "aligned but maybe null" is very unusual, and said it'd be more consistent to require "aligned and non-null". [...]
The fact that the current proposal is not a 2-way door gives me some pause and I am considering whether to adjust the validity invariant to be "aligned, non-null", like other references and so forth.
also cc @crlf0710, the person driving this initiative
OK, I've adjusted the proposal slightly so that the validity invariant is now word-aligned, non-null. This is consistent with other pointer-like values and offers us more room for future change if needed.
As such, I am ready to ...
@rfcbot fcp merge
Team member @nikomatsakis has proposed to merge this. The next step is review by the rest of the tagged team members:
No concerns currently listed.
Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!
See this document for info about what commands tagged team members can give me.
without introducing UB, we could also loosen the safety invariant to permit a sentinel value (such as NULL), but that would require a branch or other check in the upcast code, which would be less efficient.
I am not sure if that is true -- there could be code relying on this safety invariant. Extreme case -- this fn is sound under the rules as specified above:
pub fn evil(p: *const dyn Debug) {
let parts: [usize; 2] = unsafe { mem::transmute(p) };
if parts[1] == 0 {
// A compiler-allocated vtable can never be at address 0.
unsafe { hint::unreachable_unchecked() };
}
}
This relates to that question about "type system invariants with wiggle room" that you keep getting back to. :)
I just want to note that with feature(ptr_metadata)
, the safety and validity invariants of the metadata part of *const dyn Trait
must be the same as the safety and validity invariants of <fn std::ptr::metadata::<dyn Trait>>::Output
. (This type is currently spelled std::ptr::DynMetadata<dyn Trait>
.) This is a necessary for std::ptr::metadata
, <ptr>::to_raw_parts
, and std::ptr::from_raw_parts
to be safe.
It is perhaps also worth drawing analogy to slices, which stably have a metadata of unburdened usize
, exposed via std::ptr::slice_from_raw_parts
and normal casts[^1]. This prevents a safe fn(*const [T]) -> Layout
because even without the size <= isize::MAX
rule, the size calculation of size_of::<T>() * len
can overflow. (A checked operation can in theory be provided.) Given it's possible/safe to cast from *const [()]
to *const [T]
(they have the same ptr metadata type), this is necessary but unfortunate in hindsight[^3].
With *const dyn Trait
we have a bit more typesafe wiggle room, as *const dyn Trait1
and *const dyn Trait2
have distinct metadata types (as
casting between them is forbidden). I personally agree that the invariants of *const dyn Trait
's metadata should be validity aligned and nonnull, safety be safe &'static TraitVtable
.
[^1]: While std IIRC does not yet provide any direct way to get the length from an arbitrary *const [T]
, you can probably[^2] do so safely for any nonnull pointer by casting to *const [()]
and calling <[()]>::len
.
[^2]: Caveats around deallocated pointers may apply. If you alloc
a ZST and then dealloc
it, the pointer may possibly be UB to dereference (this allows reasoning about pointer allocation validity without knowing the pointee size). A weaker form of that is dereferencing a pointer to a real allocation for zero bytes. Personally I think it better/easier to say zero-byte dereferences are always valid (except at null), but there are others who strongly disagree, and I'm not fully aware of the backend/optimizer benefits/concerns around doing polymorphic reasoning which having a special case for zero bytes complicates.
[^3]: Given a time machine, I'd want to have the ptr metadata API from the start and require using that to cast between pointers to ?Sized
types. Additionally, ptr::metadata(*const [T])
wouldn't return usize
directly but instead a type still associated to T
, in order to restrict safe *const [T]
construction to those which describe valid Rust Allocated Objects. That type (however named) would then offer conversion to and checked from usize
.
There's an interesting secondary question, though: do reborrowing rules apply to the pointer's metadata? Given the vtable reference is shared and 'static
, the only difference is whether the pointer is marked dereferencable
. With a ptr_metadata
lens, *const T
is (ptr::Opaque, ptr::Metadata<T>)
, and either semantic is fairly simple to provide; either store &'static TraitVtable<_>
(which gets retagged[^4]) or ptr::WellFormed<TraitVtable<_>>
(which doesn't).
Because either semantic can be implemented, this needn't block on figuring out ptr_metadata
, custom unsized pointee kinds, etc. Plus, I think it's a reasonable outcome to recommend authors of custom pointee kinds use ptr::WellFormed
s rather than &'static
s such that pointers to their custom pointee kinds are easier to work with.
[^4]: Assuming a version of the borrow rules which recurse on fields.
One final note: it's possible (if annoying) to provide a constant initializer for even a valid vtable pointer. References to static
s are not const
, but AIUI this is because using a static
in const
is not possible, and there's no way to differentiate between a usable and an opaque reference in const
. If (and only if) the vtable reference type is 100% opaque to const
, it'd be possible to represent the vtable reference in const
symbolically and substitute it out when static
addresses are resolved.
There are enough annoyances and roadblocks there that just treating it as a no is reasonable, but it's at least theoretically possible.
I just want to note that with feature(ptr_metadata), the safety and validity invariants of the metadata part of *const dyn Trait must be the same as the safety and validity invariants of <fn std::ptr::metadata::
>::Output. (This type is currently spelled std::ptr::DynMetadata .) This is a necessary for std::ptr::metadata, ::to_raw_parts, and std::ptr::from_raw_parts to be safe.
FWIW, for soundness of those functions, it is enough for the safety invariants to be the same. They don't have to have the same validity invariants.
There's an interesting secondary question, though: do reborrowing rules apply to the pointer's metadata? Given the vtable reference is shared and 'static, the only difference is whether the pointer is marked dereferencable. With a ptrmetadata lens, *const T is (ptr::Opaque, ptr::Metadata
), and either semantic is fairly simple to provide; either store &'static TraitVtable< > (which gets retagged4) or ptr::WellFormed<TraitVtable<_>> (which doesn't).
At least on raw pointers I don't think we should have dereferenceable
or any reborrowing, that would effectively make the validity invariant that this must point to actual memory.
So the raw ptr vtable type would be more like &'static [usize; 0]
.
@RalfJung
This relates to that question about "type system invariants with wiggle room" that you keep getting back to. :)
Ah, yes, you're correct, but ... hmm. This is equally true of the validity and safety invariants, right? The example you gave was code assuming that the vtable was not NULL, but that's actually guaranteed to be true because of the validity invariant (the safety invariant further refines that to "vtable must be valid" (which we don't precisely define, but anyway).
On the topic of what valid means, I realized that the definition of a valid vtable for a *const dyn Foo
value shouldn't, I don't think, say that the vtable must be the correct vtable for the "hidden type" of the data pointer, because the data pointer for a *const
isn't required to be anything in particular. It's a bit more subtle than that. I don't think we have to establish it at this time though.
On the topic of what valid means, I realized that the definition of a valid vtable for a
*const dyn Foo
value shouldn't, I don't think, say that the vtable must be the correct vtable for the "hidden type" of the data pointer
To concur and state explicitly: if we want to allow dangling *const dyn Trait
, this is necessary, as the concrete "hidden" pointee type does not exist.
(Definitely not an immediate question, but I wonder if we can't make ptr::null
/dangling
work by generating vtables for !
; it might even be possible to just have a single shared !
vtable (per compilation unit) with enough supertrait pointers to be used as any trait's vtable? Though it might need to have enough dummy function pointers as well if taking the address of the method without calling it is possible.)
On the topic of what valid means, I realized that the definition of a valid vtable for a *const dyn Foo value shouldn't, I don't think, say that the vtable must be the correct vtable for the "hidden type" of the data pointer,
Agreed, that is a meaningless definition. There's no such thing as a "hidden type" in the op.sem.
What I proposed is to say that it must be a valid vtable for the given trait, for an arbitrary type. We need it to be for the right trait so that vtable lookups for upcasts and dyn method dispatch will work.
Ah, yes, you're correct, but ... hmm. This is equally true of the validity and safety invariants, right? The example you gave was code assuming that the vtable was not NULL, but that's actually guaranteed to be true because of the validity invariant (the safety invariant further refines that to "vtable must be valid" (which we don't precisely define, but anyway).
Validity invariants are an op.sem / UB thing, which means they are whole-program questions: does this program (when interacting with a given environment for I/O) have UB, yes or no? On that level, weakening the validity invariant is always backwards-compatible since it makes fewer programs UB.
Weakening becomes a problem when considering individual functions, which usually only makes sense for considering safe functions and asking whether they are sound -- hence this is primarily a concern for safety invariants.
If this is an unsafe function, then it must anyway document what the preconditions are. A problem only arises if the precondition is explicitly (or implicitly) documented as "can be an arbitrary valid raw ptr", i.e., if it refers to the validity invariant. I don't know how common of a precondition that is for unsafe functions. It is usually not very useful; I would expect the precondition to be the safety invariant unless explicitly stated otherwise, and if it is explicitly stated I think it will rarely reference the validity invariant and instead just spell out what it requires. A function that says "vtable must be aligned and non-null" remains correct even if validity later allows null vtables.
:bell: This is now entering its final comment period, as per the review above. :bell:
I would expect the precondition to be the safety invariant unless explicitly stated otherwise, and if it is explicitly stated I think it will rarely reference the validity invariant and instead just spell out what it requires.
This isn't quite my experience. While this is a common (and probably better) style, I've also seen code which instead of stating the preconditions, states the weakenings. So instead of "the pointer's vtable pointer must be aligned and non-null," something along the lines of "the pointer's vtable pointer is not required to point to a valid vtable." An example might be
/// Extract the vtable pointer from a trait object pointer.
///
/// The validity of the resulting pointer is unchanged from that of the input's vtable pointer.
fn get_vtable(p: *const dyn Trait) -> ptr::NonNull<()> {
let meta = ptr::metadata(p);
// elided, since this is more strongly typed it's difficult
}
This kind of weakening is more commonly seen with something like &mut [u8]
, to say that the reference is only written to and uninit data behind the reference is allowed.
This probably still allows weakening the validity requirement, as "safety minus specified" doesn't change, but I'd worry about people substituting "safety minus (current safety additions to validity)" as just "validity".
@RalfJung
If this is an unsafe function, then it must anyway document what the preconditions are.
Hmm, this is probably off-topic, but I'm not sure I buy this distinction you are trying to draw. It seems like unsafe code authors and the compiler are equally capable of optimizing layouts and other things based on validity invariants. As a silly example, maybe we're doing some kind of hand-rolled type that we want to exploit the niche just like Option<*const dyn Foo>
would:
macro_rules! my_option {
($t:path) => {
union MyOption {
data: *const dyn $t,
raw_bytes: [usize; 2]
}
...
}
}
Now I create my_option!(std::fmt::Debug)
and I get something that acts kind of like Option<*const dyn Debug>
, I might well document that this accepts "any *const dyn std::fmt::Debug
value; doesn't require a vtable vtable to be met", but that is only a valid thing for me to say because of the validity invariant.
I guess this comes back to what you said that you expect people saying "any value that meets the validity invariant" to be unusual. Maybe! I'm not sure.
document that this accepts "any *const dyn std::fmt::Debug value; doesn't require a vtable vtable to be met"
If by "any such value" you mean "anything that satisfies the validity invariant", then that is exactly the case I mentioned where the documentation explicitly or implicitly references the validity invariant. If it instead said "anything that satisfies the validity invariant as it is defined in Rust 1.65", then that would be unambiguous and weakening the validity invariant later would not be a problem.
@RalfJung I updated the text to read:
It may be possible to weaken the validity or safety invariants later, but we risk finding that people have written unsafe code that relies on them. For example, people could build data structures using unions that assume that the vtable pointer is non-null. If we then later permitted a null value, this code could create UB. This is particularly problematic for changes to the safety invariant, since the assumption is that one can take a value which meets the safety invariant and give it to arbitrary code without creating UB (if the value only meets the validity invariant, and not the safety invariant, then you are supposed to audit and control all the code which uses that value, so this is less of a problem).
Can we explicitly leave the validity invariant as something to be specified in the future? i.e. producers of fat raw pointers would be required to satisfy the entire safety invariant as the validity invariant, but consumers of them should not rely on it.
A problem only arises if the precondition is explicitly (or implicitly) documented as "can be an arbitrary valid raw ptr"
I feel fairly strongly that such documentation is unconditionally wrong and needs to be fixed. This would include @nikomatsakis 's:
I would expect the precondition to be the safety invariant unless explicitly stated otherwise
Yes. I also feel very strongly that this needs to be the case. As a matter of fact, I'd even consider "the assumptions an API may make when nothing is explicitly stated" a good candidate for the definition of "safety invariant."
/// The validity of the resulting pointer is unchanged from that of the input's vtable pointer.
The documentation should imo avoid using "validity" here. The most natural interpretation of this text documentation would imply that the input pointer does not have to satisfy it's validity requirements, which is of course not the case. Better alternatives might be "the vtable pointer is only required to be non-null" or "the safety conditions are the same ones as for ptr::metadata
".
This probably still allows weakening the validity requirement, as "safety minus specified" doesn't change, but I'd worry about people substituting "safety minus (current safety additions to validity)" as just "validity".
"The input must satisfy it's validity requirements" is always a useless thing to document, because code that violates this necessarily has language UB anyway.
Wrt @nikomatsakis 's suggestion:
any
*const dyn std::fmt::Debug value
; doesn't require a vtable vtable to be met
I'm not sure what this was supposed to say, possibly "doesn't require a valid vtable"? This phrasing is slightly ambiguous. If it is meant to mean "doesn't require a valid vtable pointer," see my point above. If it is meant to mean "doesn't require a pointer pointing to a valid vtable," then the function has disclaimed all safety invariants. I do think we should decide that validity invariants are not stable (in the one direction); this would mean that this documentation is also wrong, and I'm okay with that. It should instead explicity state the assumptions it is making.
Hmm, I think we should move the discussion about whether people can/will say "accept anything that meets the validity invariant" somewhere else, as I think it's independent of this particular issue.
With respect to the question of leaving the validity invariant unspecified: I'd prefer not to. I think we need to make decisions at some point. However, we could make this a "recommendation" to be finalized by the (under discussion) opsem team. I'd be happy with that, since I think this is a good example of the kinds of decisions that team will make. The other point is that the validity invariant I am currently recommending (aligned, non-null) is made in part to be consistent with other validity invariants, so it makes sense to consider the group as a whole.
My main immediate interest is unblocking the stabilization of dyn upcasts. We have plenty of accrued debt in the form of validity/safety invariants that have not been finalized.
Removing nomination as this is in FCP and was discussed at prior lang team meetings.
(it would be possible to permit a sentinel value, like null, to be used, but that would imply that upcasting must have a branch, making it less efficient).
I am wondering if this solution was not dismissed too early.
For reference, C++, another zero-cost abstraction language, does null check in static_cast
that do pointer adjustment: https://godbolt.org/z/jd4n7YvPW
I'm not seeing upcasting raw pointer as a frequent operation. Most likely this would be done next to actually using this pointer anyway which would probably allow to optimize the null check.
Maybe the rule could be that *dyn T
can have a null vtable pointer only if the second pointer is also null.
Having a working ptr::null<T: ?Sized>
would be really useful, for example, to implement Default for some struct Foo<T:?Sized> { ptr *const T, /*..*/ }
And what should dyn calls (of safe functions) do when the vtable is null? If we want to say "UB" we'd have to make them unsafe somehow; otherwise the only option I can think of is to make them panic and that sounds bad.
The final comment period, with a disposition to merge, as per the review above, is now complete.
As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.
This will be merged soon.
OK, so this FCP is complete. The question is, what do we do now to close the issue? :)
Option A. Edit the reference or perhaps necronomicon.
Option B. Edit API documentation -- somewhere? I'm not sure if we have an "API page" for "dyn types" in general.
Option C. Open an issue on unsafe-code-guidelines repository and link to this?
@RalfJung or @JakobDegen -- what do people typically do? Have we stabilized decisions about validity/safety conditions before and how did we document them?
The other question was whether to make this a 'strong recommendation' for an opsem-like team, which I would still be ok with. I think the crucial ingredients are...
*const dyn
to safe code must supply a valid, compiler-supplied vtable (so there is no "general null pointer" that can be released to safe code)Option<*const dyn>
-- i.e., we don't want the validity condition to be "anything at all"This leads you to a few options, and the one we ultimately chose was based on Ralf's recommendation beacuse it led to more consistency, so I'd be ok with saying "the opsem team will define the precise conditions, this is our recommendation", though I suppose that's a weakening of what we just FCP'd. :)
Have we stabilized decisions about validity/safety conditions before and how did we document them?
We have https://github.com/rust-lang/rust/pull/98919 but that doesn't really help here.
https://doc.rust-lang.org/reference/behavior-considered-undefined.html is where we have all validity invariants, so that would be a Reference PR. We don't really have any place to put safety invariants though. Even specifying safety invariants precisely requires something like Iris so it's not like we can even hope to do that anywhere official...
Have we stabilized decisions about validity/safety conditions before and how did we document them?
FWIW https://github.com/rust-lang/rust/pull/117534 is documenting the safety invariant of str
, and https://github.com/rust-lang/rust/pull/116677 documents parts of it for references. However, for lack of an official term, they are currently trying to describe the idea of a safety invariant in one sentence rather than using a fixed term.
We probably want a fixed term. However I am not sure if "safety invariant" is the best term -- we've been using it for a long time now, but I regret coining it; "library invariant" would be my preferred choice these days. (And "validity invariant" -> "language invariant". That goes well with "language UB" and "library UB". Though maybe those are not terms we want to use either?) Also I am not sure where to best define such a term. That would have to go together with an introduction to how we think about unsafe code wrapped behind safe abstractions and things like that.
This issue proposes a resolution to the last outstanding question blocking
dyn
upcast. It is also available on the initiative repository.Background
We are trying to enable "upcasts" from a
dyn Trait
to its supertraits:The key factor for these upcasts is that they require adjusting the vtable. The current implementation strategy is that the vtables for the
Bar
trait embed pointers to aFoo
vtable within them:This way, given a
&dyn Bar
object, we convert itsBar
vtable to the appropriateFoo
vtable by loading the appropriate field.Although we don't want to commit to a particular implementation strategy, we do want to leave room for this strategy. One implication is that performing an upcast may require loading from the vtable, which implies that the vtable must be a valid pointer to an actual Rust vtable. Although
&dyn Bar
references would always contain a valid vtable, the same is not necessarily true for a raw pointer like*const dyn Bar
or*mut dyn Bar
.In the language today, we only support "noop upcasts" that don't affect the vtable, and these are safe (e.g., converting from
*const dyn Foo + Send
to*const dyn Foo
). If we extend the set of upcasts to permit vtable-adjusting upcasts, like*const dyn Bar
to*const dyn Foo
, this implies that, for safe code at least, all*const dyn Trait
values must have a valid vtable, so that we know we can safely load the required field and perform the upcast.On the other hand, we do not generally require raw
*mut T
pointers to point to valid data. In fact, we explicitly permit them to have any value, including null, and only require that they point to valid data when they are dereferenced. Because dereferencing a raw pointer is an unsafe operation, it has always been considered safe to expose an arbitrary raw pointer to unsafe code -- the unsafety arises when you take a raw pointer from an unknown source and dereference it, since unless you can trace the origin of that pointer you can't possible guarantee that it is valid to dereference.This brings us to the conflict:
*const dyn
values (e.g.,*const dyn Foo + Send
to*const dyn Foo
).*const dyn Foo + Bar
to*const dyn Foo
) that would require adjusting the vtable much like the upcasts currently being stabilized.Related future consideration: virtual method calls on raw pointers
There have been requests to extend traits with the option to include raw pointer methods:
These methods would be useful when writing unsafe code because having an
&self
method requires satisfying the validity conditions of an&
-reference, which may not be possible. If we did have such methods, however, it raises the question of whether it would be safe to invokeis_null
on a*const dyn PtrLike
reference. Just as with upcasting, invoking a method from the vtable requires loading from the vtable, and hence requires a valid vtable generated by the compiler.The solution we propose in this document also resolves this future dilemma.
Definitions: validity vs safety invariant
We adopt the terms validity and safety invariant from the unsafe code guidelines:
In short, the validity invariant defines a condition that must always be true, even in unsafe code, and the safety invariant defines an invariant that unsafe code must guarantee before a value can be released to be used by arbitrary code.
Contours of the solution space
We can fix this conflict in one of two basic ways:
First, *we could make vtable-adjusting upcasts casts (and `Self
method calls) unsafe**. This is difficult to implement and would require changes to the
Coercetrait, which is already excessively complicated. In exchange, it offers at best marginal benefit: raw
*dyn` pointers can be released to safe code, but safe code can't do anything interesting with them. For this reason, we do not recommend this option.If vtable-adjusting casts (and
*Self
method calls) are safe, then the safety invariant for*dyn
types must be that their metadata points to a fully valid vtable (i.e., a vtable created by the compiler). This ensures safe code can perform upcasts or dynamic dispatch. This also implies thatstd::ptr::null
(which is safe) cannot be extended toT
whereT: ?Sized
unless further changes are made, since we would need to provide a valid vtable (it would be possible to permit a sentinel value, like null, to be used, but that would imply that upcasting must have a branch, making it less efficient).There are, however, various options for the validity invariant, ranging from no invariant to requiring a fully valid vtable at all times. The strict invariants offer some benefits, such as the ability to have a niche for
*dyn
pointers. We survey the options here:*dyn
metadatastd::mem::zeroed
Explanations for the column titles:
*dyn
metadata -- describes the invariant that applies to the metadata for a*dyn
value.sizeof(Option<*const dyn Foo>) == sizeof(*const dyn Foo)
.std::mem::zeroed
-- true ifstd::mem::zeroed
can be used to create a valid value (very convenient). This makes it trivial to innitialize a*const dyn
with a dummy value, though the value cannot be released to safe code.*const dyn Foo
that satisfies the validity invariant, no matter the traitFoo
. This makes it easy to initialize a*const dyn
with a dummy value, though the value cannot be released to safe code.Other points:
*dyn
values currently have a niche.fn
and&
-references) are expected to have a validity invariant of word-aligned, non-null.Proposal
The proposal is as follows:
*dyn
metadata to be word-aligned and non-null.Vtable-adjusting upcasts are defined as:
dyn Bar
todyn Foo
from the introduction). In particular, upcasts that simply add or remove auto-traits are not vtable-adjusting (e.g.,dyn Debug + Send
todyn Debug
).This approach...
*const dyn
;fn
;The rules also imply that...
*dyn
pointer when*dyn
pointer is upcast (or invoke methods);*dyn
pointer is released to arbitrary code, because that code may upcast (or invoke methods).std::ptr::null
to permitT: ?Sized
would not be safe.Possible future changes
It may be possible to weaken the validity or safety invariants later, but we risk finding that people have written unsafe code that relies on them. For example, people could build data structures using unions that assume that the vtable pointer is non-null. If we then later permitted a null value, this code could create UB. This is particularly problematic for changes to the safety invariant, since the assumption is that one can take a value which meets the safety invariant and give it to arbitrary code without creating UB (if the value only meets the validity invariant, and not the safety invariant, then you are supposed to audit and control all the code which uses that value, so this is less of a problem).
Prior links