Closed nikomatsakis closed 5 years ago
My feeling here is that any non-null value should be ok.
My preference is that any non-null value is a valid reference to a 0-sized type.
Also, for things to make sense, all reborrows would have to preserve that value, e.g.:
struct Foo((), ());
fn addr_of<T>(t: &T) -> usize { t as *const T as usize }
fn main() {
let foo = unsafe { &*(0x1337 as *const Foo) };
assert_eq!(addr_of(foo), 0x1337);
assert_eq!(addr_of(&*foo), 0x1337);
assert_eq!(addr_of(&foo.0), 0x1337);
assert_eq!(addr_of(&foo.1), 0x1337);
}
I don't see any good reasons not to adopt this, but some people have suggested that we may sometimes "squash" reborrows to 0x1 or something to reduce implementation complexity.
@arielb1 I agree with you. A reborrow should, in my opinion, always be equivalent to the original...
I think that any non-null value should be OK, and that the compiler should correctly propagate that value. It is useful and expected in #[repr(C)] structs for a ZST before a sized type to have the same address of that field.
In addition, ZSTs have the advantage that they can be passed around as &T and &mut T while not allowing the consumer to move the actual data behind the reference (which may be a variable sized #[repr(C)] struct). The methods on the ZST T can then cast the reference to a reference to the actual backing data type, and avoid unsafety that would come by allowing moving, for example, the header of a FFI C type. I think patterns like that should be OK, as they solve a real problem, and shouldn't break anything.
This means that pointers to opaque C structs can be passed around like native rust types behind &T and &mut T, rather than requiring special structs like:
struct SomeCTypeRef<'a> {
ptr: *const SomeCType,
_marker: PhantomData<&'a u8>,
}
which require a large number of impls on them and don't provide a nice API.
...all reborrows would have to preserve that value...
Particularly since people often use *mut ()
as the equivalent of C's void *
...
The issue mentions &T
but not &mut T
- is there any reason for this? I guess they would be the same.
Shared references are always the most complicated case... for the particular case of &()
, I certainly agree that this can be any (non-null) value. But considering that types with the same ownership can have very different sharing (just look at T
vs. Cell<T>
), I find it conceivable that a zero-sized type has some non-trivial sharing going on... maybe as a "token" handed around in a protocol. I am not saying that this is useful, I just say I don't think there is a theorem saying "if the type is zero-sized, then its sharing is trivial". We could explicitly add such a property as an axiom, of course, but why unnecessarily restrict the design space of types?
Related questions: Are zero-size types automatically Copy
? Is it safe, given that mem::size_of<T>() == 0
, to just "make up" an instance of T
? I would say no, because that T
could be part of a library which assigns actual meaning to this token being floating around... if you want, I could try to come up with an example of an API that would be unsafe with the rule above, but safe otherwise.
The issue mentions
&T
but not&mut T
- is there any reason for this? I guess they would be the same.
We mean all of them.
Related questions: Are zero-size types automatically
Copy
?
You seem to be confused. Copying a ZST is guaranteed to not be UB by itself. However, copying a UB is not guaranteed to be safe - unsafe code should be able to assume that random ZSTs are not randomly being created from nothing.
UB means that the compiler can do anything it wants. Safety violations mean that "certified-safe" code can do anything it wants. If you don't care about using the type-system to prove that your program is safe (either because you use some other way to prove it, or just don't give a ****), safety violations are not important.
I see UB and safety as being closely related -- the guarantee safety (of a library) provides is that calling it from safe code will not trigger UB.
The discussion above sounded (to me) like it was making the statement "if T
is a ZST, then any non-null pointer is a legal value for &T
and &mut T
. The way I see it, this is turn implies that I can just make up a &T
by casting a non-null pointer to this type, which could break code. I take it from your comment that's not what was meant, but then I find it hard to see what this means for ZST that have private fields. Certainly, neither the compiler nor the programmer should be able to just make those up, and hence the above statement about legal values just doesn't apply.
I see UB and safety as being closely related -- the guarantee safety (of a library) provides is that calling it from safe code will not trigger UB.
Sure. "certified-safe" code has the property that it can be composed with arbitrary "certified-safe" code while remaining "certified-safe" and never causing UB.
If you have a dubious texture parser with buffer overflows on every edge case, declaring it as a pub fn parse_texture(&[u8]) -> Texture
means that your program is absolutely no longer certified safe - you can't modularly see that your program will not UB. OTOH, as long as you don't call the function with an invalid texture at runtime, your program remains well-defined - the optimizer can't go "if I pass junk in this length field here there is UB, so lets go ahead and do it".
Similarly, if you have a
/// indicates that the GIL is held by the current thread.
pub struct GILToken(PhantomData<()>);
impl !Sync for GILToken {}
pub fn do_stuff_that_assumes_gil_is_held(gil: &GILToken) { /* .. */ }
And you have some code from another module that goes
fn make_token() -> &'static GILToken {
unsafe { &*(0x594f4c4f as *const GILToken) }
}
Then doing do_stuff_that_assumes_gil_is_held(make_token())
on some random thread can be UB, but just calling make_token()
by itself isn't. The optimizer can't go "if I would call that safe function, there would be UB, so go ahead and call it".
@RalfJung
I am not saying that this is useful, I just say I don't think there is a theorem saying "if the type is zero-sized, then its sharing is trivial".
Can you elaborate on what you mean by "its sharing"?
Are zero-size types automatically Copy?
No, they are not https://is.gd/eCuyAk.
Is it safe, given that mem::size_of
() == 0, to just "make up" an instance of T? I would say no, because that T could be part of a library which assigns actual meaning to this token being floating around.
Agreed, this is not safe, and I do not believe you can do it with safe code. I'm curious if you think this is in dispute :)
@RalfJung
The discussion above sounded (to me) like it was making the statement "if T is a ZST, then any non-null pointer is a legal value for &T and &mut T. The way I see it, this is turn implies that I can just make up a &T by casting a non-null pointer to this type, which could break code.
Hmm, I'm not sure why saying that a &T
could be represented by any non-null pointer would imply that one can synthesize one at will. These seem like orthogonal questions to a certain extent, right? It seems clear (I think) that constructing a &T
reference requires unsafe code unless you have an instance of T
lying around. For example, @arielb1's function:
fn make_token() -> &'static GILToken {
&*(0x594f4c4f as *const GILToken)
}
would not actually compile, because the *
is being applied to a value of type *const GILToken
. We would need an unsafe
keyword. In that case, it seems like synthesizing an &ZST
is no different than synthesizing any other sort of "safe type" -- whether it makes sense depends on larger semantic predicates about your program, which I guess is the distinction that @arielb1 was driving at.
@nikomatsakis
Yeah. I forgot the unsafe
block.
@nikomatsakis
I am not saying that this is useful, I just say I don't think there is a theorem saying "if the type is zero-sized, then its sharing is trivial".
Can you elaborate on what you mean by "its sharing"?
I mean the protocol that governs the type while it is shared -- the set of invariants that define what is and isn't legal to do with the memory occupied by T
, when there are &T
around.
Notice that a types "sharing" is not defined by the invariants that make up the type when it is fully owned. For example, Cell<T>
and T
are equivalent when we fully own them, but &Cell<T>
and `&T& are obviously very different types.
Is it safe, given that mem::size_of() == 0, to just "make up" an instance of T? I would say no, because that T could be part of a library which assigns actual meaning to this token being floating around.
Agreed, this is not safe, and I do not believe you can do it with safe code. I'm curious if you think this is in dispute :)
Hmm, I'm not sure why saying that a &T could be represented by any non-null pointer would imply that one can synthesize one at will.
I guess I misunderstood some of your and @arielb1's earlier statements.
To me, "&T could be represented by any non-null pointer" reads as "If v
is a non-null pointer, then v
is an inhabitant of &T
." This is a rule similar to "If v
is a pointer to the beginning of a heap allocation that was done using the standard allocator and that noone else has any access to, then v
is an inhabitant of Box<T>
." This logical assertions in turn can be used to justify the correctness of a piece of unsafe code that takes a non-null pointer and turns it into a &T
.
I now see that's now what you meant. I am not sure I can completely (formally) make sense of what you mean instead -- probably something related to, for example, the compiler not being allowed to just add spurious dereferences of a v
of type &T
, since this could conceivably be any pointer.
@RalfJung
To me, "&T could be represented by any non-null pointer" reads as "If v is a non-null pointer, then vis an inhabitant of &T." This is a rule similar to "If v is a pointer to the beginning of a heap allocation that was done using the standard allocator and that noone else has any access to, then v is an inhabitant of Box
." This logical assertions in turn can be used to justify the correctness of a piece of unsafe code that takes a non-null pointer and turns it into a &T.
Hmm. I feel like it takes more than possessing the same bits to have a value of a suitable type. That is, surely there is a distinction between a struct Foo(u32)
and a u32
, even though they have the same representation?
I mean the protocol that governs the type while it is shared
This is the memory model repo, not the soundness repo.
This is the memory model repo, not the soundness repo.
The two are closely related. After all, the core purpose of the guarantees provided by types is to make sure that the program has no UB.
Hmm. I feel like it takes more than possessing the same bits to have a value of a suitable type. That is, surely there is a distinction between a struct Foo(u32) and a u32, even though they have the same representation?
Well, yes, that's exactly what I mean. It's not enough for v
to be a non-null pointer to conclude that it is an &T
for a zero-sized T
. (Though this does hold, I believe, for the special case T = ()
.) I at first misunderstood your statements here.
@RalfJung:
Well, yes, that's exactly what I mean. It's not enough for
v
to be a non-null pointer to conclude that it is an&T
for a zero-sizedT
. (Though this does hold, I believe, for the special caseT = ()
.) I at first misunderstood your statements here.
I think you may have it backwards, here.
My preference is that any non-null value is a valid reference to a 0-sized type.
I read this as saying that, given an &T
, one cannot predict the value of the (underlying) raw pointer that represents it save the fact that it is not null.
IOW, I parsed it as a denial of a "canonical value" for references to ZSTs. This seems to bear that out:
[...] some people have suggested that we may sometimes "squash" reborrows to 0x1 or something to reduce implementation complexity.
Well, yes, that's exactly what I mean. It's not enough for
v
to be a non-null pointer to conclude that it is an&T
for a zero-sizedT
. (Though this does hold, I believe, for the special caseT = ()
.) I at first misunderstood your statements here.
From a UB standpoint, it does not matter what the value is. From a soundness standpoint, nobody's preventing you from using the address of an &T
as the parameter to a safe unchecked_get
.
This is now discussed as part of the validity invariants: https://github.com/rust-lang/unsafe-code-guidelines/issues/76.
In #3, the question was posed what values a
Box<T>
whereT
is ZST can have -- one might ask a similar question about a&T
. The answer may well be the same as #3, but it could be different, depending on whether theBox
API is deemed to impose its own limitations.