rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
97.35k stars 12.59k forks source link

incorrect UB when when a `!` place is constructed (but not loaded!) #117288

Open RalfJung opened 11 months ago

RalfJung commented 11 months ago

This code shouldn't have UB but Miri reports UB and rustc generates a SIGILL:

#![feature(never_type)]
fn main() {
    unsafe {
        let x = 3u8;
        let x: *const ! = &x as *const u8 as *const _;
        let _ = *x;
    }
}

Thanks to @Nadrieril for finding this.

Zulip discussion

RalfJung commented 11 months ago

Labeling as "unsound" since under the rules that @rust-lang/opsem decided for *, and how we've been treating _ in general, this shouldn't SIGILL.

Nadrieril commented 11 months ago

Note that this triggers for ! but not enum Void{} so it's likely a type-based special case

RalfJung commented 11 months ago

This is already wrong in the initially generated MIR:

        _6 = &_2;
        _5 = &raw const (*_6);
        _4 = move _5 as *const ! (PtrToPtr);
        AscribeUserType(_4, o, UserTypeProjection { base: UserType(1), projs: [] });
        _3 = _4;
        StorageDead(_5);
        FakeRead(ForLet(None), _3);
        AscribeUserType(_3, o, UserTypeProjection { base: UserType(2), projs: [] });
        StorageDead(_6);
        StorageDead(_4);
        StorageLive(_7);
        StorageLive(_8);
        _8 = (*_3); // an actual load from the place!

If we use bool instead we get just a PlaceMention, as we should

        _5 = &_1;
        _4 = &raw const (*_5);
        _3 = move _4 as *const bool (PtrToPtr);
        AscribeUserType(_3, o, UserTypeProjection { base: UserType(1), projs: [] });
        _2 = _3;
        StorageDead(_4);
        FakeRead(ForLet(None), _2);
        AscribeUserType(_2, o, UserTypeProjection { base: UserType(2), projs: [] });
        StorageDead(_5);
        StorageDead(_3);
        PlaceMention((*_2)); // no load

@cjgillot any idea why we don't just get a PlaceMention for !?

bjorn3 commented 11 months ago

-Zunpretty=thir-tree shows a NeverToAny expression. Maybe this has something to do with it?

Nadrieril commented 11 months ago

Oh, then this looks like fallback behavior: if I annotate let _: ! = *x; I only get a "entering unreachable code" error

RalfJung commented 11 months ago

Yeah that's also wrong, but different.^^ MIR for that case:

        _6 = &_2;
        _5 = &raw const (*_6);
        _4 = move _5 as *const ! (PtrToPtr);
        AscribeUserType(_4, o, UserTypeProjection { base: UserType(1), projs: [] });
        _3 = _4;
        StorageDead(_5);
        FakeRead(ForLet(None), _3);
        AscribeUserType(_3, o, UserTypeProjection { base: UserType(2), projs: [] });
        StorageDead(_6);
        StorageDead(_4);
        PlaceMention((*_3));
        AscribeUserType((*_3), +, UserTypeProjection { base: UserType(4), projs: [] });
        StorageDead(_3);
        StorageDead(_2);
        unreachable;

So we get a PlaceMention but then we also get a unreachable...

RalfJung commented 11 months ago

-Zunpretty=thir-tree shows a NeverToAny expression. Maybe this has something to do with it?

Yeah that's probably the right trail. MIR building might not even be at fault, the THIR is already wrong since it thinks there's a never value being constructed here. These THIR nodes seem to be constructed from Adjust::NeverToAny; no idea if there's any way to dump those.

Who are the right people to ping for THIR issues...?

Jules-Bertholet commented 11 months ago

@rustbot label F-never_type

asquared31415 commented 11 months ago

Note that stable workarounds to access the never type, or cases where the type might be inferred still produce identical behavior, so this does not require the feature.

trait Extractor {
    type Output;
}

impl<T> Extractor for fn() -> T {
    type Output = T;
}

type RealNever = <fn() -> ! as Extractor>::Output;

fn main() {
    unsafe {
        let x = 3u8;
        let x: *const RealNever = &x as *const u8 as *const _;
        let _ = *x;
    }
}
asquared31415 commented 11 months ago

79735 mentions some similar cases that ! is special in places, though the primary problem (that deref into a let _ was allowed without unsafe) has since been resolved.

apiraino commented 11 months ago

WG-prioritization assigning priority (Zulip discussion).

@rustbot label -I-prioritize +P-high

matthewjasper commented 11 months ago

This compiles, so this would also need some changes to type checking rules.

#![feature(never_type)]
fn make_string() -> String {
    unsafe {
        let x = 3u8;
        let x: *const ! = &x as *const u8 as *const _;
        let _ = *x; // Makes the unsafe block diverge, same as writing return String::new(); or let _ = panic!(); would
    }
}
RalfJung commented 11 months ago

Uff, yeah that should never have compiled... Cc @rust-lang/lang

Noratrieb commented 10 months ago

The reason this compiles is probably https://github.com/rust-lang/rust/blob/5777f2c6bdea7e6406d355c3dd4527bd6e368833/compiler/rustc_hir_typeck/src/expr.rs#L251-L254 HIR typeck thinks that an expression of type ! cannot possibly not diverge.. which is a reasonable assumption tbh. So I think we need to somehow special case that in HIR typeck... the let _ = *x semantics make sense at the MIR level but are pretty inconsistent for HIR

RalfJung commented 10 months ago

Can HIR tell apart places and values? A value expression of type ! obviously must diverge, but place expressions are quite different.

The comment is accurate, "Any expression that produces a value of type ! must have diverged". However, let _ = *x; does not produce a value of any type, it only produces a place.

RalfJung commented 10 months ago

Turns out even this compiles:

#![feature(never_type)]
pub fn make_string() -> String {
    unsafe {
        let x = 3u8;
        let x: *const ! = &x as *const u8 as *const _;
        let _val = std::ptr::addr_of!(*x); // considered to diverge
    }
}

There is no _ pattern involved this time. addr_of! is explicitly intended to be used in situations where the pointer doesn't point to valid data. So there's no doubt IMO that that HIR analysis needs to change.

digama0 commented 10 months ago

It seems like that should be encoded in either Expectation or Needs, since the place/value requirement is a context thing. Needs seems to be quite close to what is needed, it differentiates between MutPlace and None although I'm not exactly sure what this is used to mean.

RalfJung commented 10 months ago

@rust-lang/lang nominating this to get first vibes on which of the following lines should be accepted, and which should be UB. (Currently they are all accepted and all UB but that's a bug.)

#![feature(never_type)]
fn deref_never<T>() {
    unsafe {
        let x = 3u8;
        let x: *const ! = &x as *const u8 as *const _;
        let _val = addr_of!(*x); // must be OK
        let _ = *x; // should probably be OK since we said `_` discards the place (and is UB-equivalent to `addr_of!`)
        let _: ! = *x; // does a no-op type annotation force the place to be loaded and hence cause UB?
        let _: () = *x; // is this UB? should we even accept such code?
        let _: String = *x; // what about this?
        let _: T = *x; // is this UB even if T = ! ?
    }
}
digama0 commented 10 months ago

Relevant Zulip discussion: https://rust-lang.zulipchat.com/#narrow/stream/213817-t-lang/topic/Soundness.20issues.20around.20.60let.20_.60.20and.20.60!.60/near/399031795

traviscross commented 10 months ago

@RalfJung: If you would, could you make a proposal about what you believe is correct for these?:

        let _: ! = *x; // does a no-op type annotation force the place to be loaded and hence cause UB?
        let _: () = *x; // is this UB? should we even accept such code?
        let _: String = *x; // what about this?
        let _: T = *x; // is this UB even if T = ! ?
RalfJung commented 10 months ago

I don't know how to arrive at a principled answer to this. There's no obvious "correct" here, it's a judgment call.

If you are asking for my personal preference ignoring backwards compatibility constraints, then my immediate gut feeling is that places of type ! should not be subject to never-to-any coercions (the coercion should apply only to values, matching the comment that justifies the coercion in the first place). So:

        let _: ! = *x; // OK, not UB
        let _: () = *x; // does not compile
        let _: String = *x; // does not compile
        let _: T = *x; // does not compile
digama0 commented 10 months ago

I think I gave a decent "principled" argument for:

        let _: ! = *x; // OK, not UB, not unsafe
        let _: () = *x; // UB and must be wrapped in unsafe
        let _: String = *x; // UB and must be wrapped in unsafe
        let _: T = *x; // UB and must be wrapped in unsafe (even if T is instantiated to ! later)

in the zulip thread, by the reasoning that never-to-any acts like a function call and has to evaluate its input to a value (even if the output is ignored), consistent with deref coercion which has this behavior on stable:

        let p: &&String = &&String::new();
        let _: &String = *p; // does not call deref() or load *p
        let _: &str = *p; // calls deref(), loads *p

Note: currently let _ = *p; (where p: *const T) is unsafe but is a no-op. My "principled argument" would say that this should not be unsafe, but I'd rather not re-litigate that issue and keeping it syntactically unsafe seems fine to me.

RalfJung commented 10 months ago

I just don't think we should be introducing such function calls so easily. For deref coercions the user at least still has to write *. Also your examples involve references; we can be much more lenient around references. It's raw pointers that I am concerned about.

Your proposal violates the substitution principle: when T is !, annotating T vs annotating ! should not make a difference. That's extremely undesirable IMO.

So I guess I disagree with that being principled. ;) It's juts a different way to answer these questions, but no principle tells us that one is clearly theoretically preferable.

digama0 commented 10 months ago

Also from your argument I would say that let _: ! = *x; also should be UB. It is using the no-op coercion. Having this depend on which type annotation is given is a very bad idea IMO.

There is no "no-op coercion". Coercions only happen when there are type errors to be coerced, otherwise it doesn't make sense to have a bunch of different coercions going in different directions. Coercions are also visible to the user (they cause lvalue expressions to act like rvalue temporary expressions) so it cannot be the case that there is secretly a no-op coercion, this would break everything.

I just don't think we should be introducing such function calls so easily. For deref coercions the user at least still has to write *. Also your examples involve references; we can be much more lenient around references. It's raw pointers that I am concerned about.

I put "principled" in quotes exactly because it is taking an argument to its logical conclusion. I'm not (necessarily) claiming that it is a good principle to follow, but it is the one that leads to "the least spec text". It is reasonable to deviate from that in order to avoid a footgun situation, but that is the direction of more epicycles, not less.

In this case I would prefer that we use this algorithm for the actual semantics and use an error- or deny-by-default lint to make the cases you are concerned about errors. That seems to get the best of both worlds.

Your proposal violates the substitution principle: when T is !, annotating T vs annotating ! should not make a difference. That's extremely undesirable IMO.

This is not the case in the presence of coercions anyway.

#[derive(Default)]
struct Foo;
impl std::ops::Deref for Foo {
    type Target = Foo;
    fn deref(&self) -> &Self::Target {
        println!("deref()");
        self
    }
}

fn f1<U, T: std::ops::Deref<Target = U>>(t: &T) -> &U { t }
fn f2(t: &Foo) -> &Foo { t }

pub fn main() {
    f1(&Foo); // prints deref()
    f2(&Foo); // prints nothing
}
RalfJung commented 10 months ago

I don't see why let _: ! = *x; can't be considered as coercing from ! to !. What would that break?

digama0 commented 10 months ago

Are you saying that whenever there is a coercion site of type ! and a place expression of type ! is provided, a NeverToAny is inserted? That would make let _: ! = *x; UB but let _ = *x; not UB, unless you want to consider the latter a coercion site as well in which case you basically can't have unevaluated ! places at all.

I don't think that would be any less principled than what I proposed, but I'm not sure it has the behavior you want.

RalfJung commented 10 months ago

Are you saying that whenever there is a coercion site of type ! and a place expression of type ! is provided, a NeverToAny is inserted?

I am saying that type annotations force there to be a coercion.

This is not the case in the presence of coercions anyway.

That's not a counterexample to the principle I claimed. It doesn't even use type annotations. Your f1 is short for t.deref() and f2 is not, that's all.

That's not comparable with the situation we are discussing here. I'm not even sure I accept comparing deref coercions with never-to-any. These are different kinds of coercions, in how they behave. Deref coercions just become a function call in MIR but then use normal MIR semantics. It's a heuristic for adding missing function calls. Never-to-any isn't reflected in MIR at all! There's no function call here.

digama0 commented 10 months ago

Actually maybe I misunderstood your claim:

Your proposal violates the substitution principle: when T is !, annotating T vs annotating ! should not make a difference. That's extremely undesirable IMO.

When T is statically known to be ! inside the body of the function (because it's a type alias or there is some projection equality in scope or something), annotating T vs ! makes no difference. Coercions are inserted iff it would not typecheck otherwise, meaning that if the typechecker can prove that the types line up no coercion is inserted. It is only during generic instantiation that you can get into this situation where a T becomes a ! but in the body of the function it was not known to be and so a coercion was inserted, and instantiation doesn't undo that. That's what the deref example was meant to show.

That's not comparable with the situation we are discussing here. I'm not even sure I accept comparing deref coercions with never-to-any. These are different kinds of coercions, in how they behave. Deref coercions just become a function call in MIR but then use normal MIR semantics. It's a heuristic for adding missing function calls. Never-to-any isn't reflected in MIR at all! There's no function call here.

In how they behave, yes, but in how they act in the typechecker? I don't see any good reason to make them act differently. Coercions are inserted only when needed to resolve type mismatches, independently of the kind of coercion. NeverToAny doesn't act like a function call in MIR only because the Unreachable terminator doesn't require a proof of unreachability. If it did, you would need to have a value of never hanging around to pass to it.

Besides this, if you countenance the notion of a let _: String = *x; that is not UB, you have to make sense of what place of type String we have constructed. Where is it located in memory? What are the 24 bytes attached to it? If this is a PlaceMention then the place itself should be valid, but this is nonsense, there was never a String in memory, we should have produced one from the ! but there is no ! either since we've decided to somehow gloss over that part in order to make it not UB.

RalfJung commented 10 months ago

It is only during generic instantiation that you can get into this situation where a T becomes a ! but in the body of the function it was not known to be and so a coercion was inserted, and instantiation doesn't undo that. That's what the deref example was meant to show.

Yeah I got that. I just don't feel like that's the same situation. Never-to-any doesn't strike me as the same kind of coercion.

Do we have any other coercions that introduce loads from raw pointers where there were none before? None of your examples do that. That puts them an order of magnitude away from the never-to-any examples in terms of how problematic they are.

NeverToAny doesn't act like a function call in MIR only because the Unreachable terminator doesn't require a proof of unreachability. If it did, you would need to have a value of never hanging around to pass to it.

Maybe. Or maybe it just wasn't thought of in the same terms as the other coercions. In that model, why does addr_of!(*x) cause UB? There's not even a coercion site here!

The comment in the code is quite revealing: it talks about values of ! type. The fact that this might be a place was not even considered. I don't think we should use the current behave to infer intent on the treatment of places here.

So, ignoring what rustc currently does -- why do you think it is a good idea to insert such coercing function calls (including producing the value) so easily? I see a pretty large downside: it's a huge footgun. It's a much larger footgun than e.g. deref coercions because those need a reference to trigger. Having &! is already questionable at best, but there's nothing fundamentally wrong with *const !. What's the upside that makes this downside worth it?

if you countenance the notion of a let _: String = *x; that is not UB, you have to make sense of what place of type String we have constructed.

No I don't. As I said above, that code should simply not be accepted.

digama0 commented 10 months ago

NeverToAny doesn't act like a function call in MIR only because the Unreachable terminator doesn't require a proof of unreachability. If it did, you would need to have a value of never hanging around to pass to it.

Maybe. Or maybe it just wasn't thought of in the same terms as the other coercions. In that model, why does addr_of!(*x) cause UB? There's not even a coercion site here!

It shouldn't, I think the current behavior is a bug. I don't think it's possible to induce a coercion between the *x and the addr_of! and it would probably be footgunny if you could. Maybe with type ascriptions...? addr_of! isn't a coercion site so I think there is no possibility of a NeverToAny sneaking in there. And without a coercion, it's just taking the address of a valid (to construct but not load) place of type !, which is legal and not UB.

I agree that the existing code does not look like it was written considering the place/value distinction at all. To be fair, this almost never matters during typechecking; this seems to be an unfortunate side effect of the trick used to make !-containing blocks have ! type even if the ! expression is not final, which I consider deeply weird.

So, ignoring what rustc currently does -- why do you think it is a good idea to insert such coercing function calls (including producing the value) so easily? I see a pretty large downside: it's a huge footgun. It's a much larger footgun than e.g. deref coercions because those need a reference to trigger.

I think it is more regular and logical than the alternatives. As I said earlier, I believe the right place to be dealing with footgun potential is lints, while in the core language semantics we should strive for simple and regular behavior where possible because when you have to dive deep on these topics (e.g. spec work, metalanguage reasoning) anything that simplifies things and reduces special cases makes it easier to get things right and avoid bugs.

if you countenance the notion of a let _: String = *x; that is not UB, you have to make sense of what place of type String we have constructed.

No I don't. As I said above, that code should simply not be accepted.

Would it be accepted if it was a reference? I think it would be very weird for this not to be accepted if the analogous code with references would be. This is a very logical combination of features, and carving a hole in the typechecker when they intersect in this order seems problematic for macros and the like.

digama0 commented 10 months ago

Do we have any other coercions that introduce loads from raw pointers where there were none before? None of your examples do that. That puts them an order of magnitude away from the never-to-any examples in terms of how problematic they are.

Sure thing. Here's an example using the *mut T -> *const T coercion, although similar things exist for most of the coercions:

pub fn main() {
    let x: *const *mut u8 = std::ptr::null();
    unsafe {
        let _: *mut u8 = *x; // OK
        let _: *const u8 = *x; // UB
    }
}

Two remarks about this example:

RalfJung commented 10 months ago

addr_of! isn't a coercion site

If this never-to-any behavior kicks in when there is no coercion site, what makes you think that the other never-to-any examples are even coercions? As far as I can tell, they are all coming from this code, and no coercion is involved.

I think it is more regular and logical than the alternatives. As I said earlier, I believe the right place to be dealing with footgun potential is lints, while in the core language semantics we should strive for simple and regular behavior where possible because when you have to dive deep on these topics (e.g. spec work, metalanguage reasoning) anything that simplifies things and reduces special cases makes it easier to get things right and avoid bugs.

For code that introduces insta-UB, I don't think lints are strong enough. This is essentially arguing "yes we are inserting landmines into your code, but at least we'll tell you about it so don't complain". I feel strongly that we shouldn't be inserting these mines in the first place.

I can't imagine a situation where you would want the current behavior of implicitly introducing loads from raw pointers. If something our semantics does is never the thing you want, then we shouldn't do it.

while in the core language semantics we should strive for simple and regular behavior

Note that we are discussing the lowering from surface to core language here. The core language itself is completely unaffected by this proposed change. So compiler optimizations and backends don't have to worry about this. That makes the argument for "minimal, consistent, regular semantics" a much less strong one and leaves more room for considering the developer experience that we are providing.

That's one reason we have this core language / surface language split to begin with: to be able to have a small and regular core language that's good for analysis and optimization, without having to inflict programmers with the downsides of such a language.

Would it be accepted if it was a reference? I think it would be very weird for this not to be accepted if the analogous code with references would be. This is a very logical combination of features, and carving a hole in the typechecker when they intersect in this order seems problematic for macros and the like.

I don't have a strong preference for what happens when it is a reference. But I think it makes perfect sense to treat references and raw pointers differently, since they come with very different invariants. We will already (hopefully) do that for handling empty types in match (see https://github.com/rust-lang/rust/issues/117119).

Sure thing. Here's an example using the mut T -> const T coercion, although similar things exist for most of the coercions:

Ugh. We should fix those as well.

According to miri let _: mut u8 = unsafe { x }; is already UB. Apparently the unsafe block acts like a block insofar as it forces a read of the place and construction of a new temporary, which seems like a footgun of its own.

Yeah, that's correct. And yes it's not pretty.

digama0 commented 10 months ago

If this never-to-any behavior kicks in when there is no coercion site, what makes you think that the other never-to-any examples are even coercions? As far as I can tell, they are all coming from this code, and no coercion is involved.

I'm confused. Never-to-any is a coercion, it is documented as a coercion listed among all the others, and it has all the hallmarks of a coercion, being triggered on type errors to make things match up that wouldn't otherwise. I would expect it to only be inserted at coercion sites, just like every other coercion.

As has already been established (I hope?), the existing code is buggy, and will need to be changed. It is handling something slightly different though: values (or more precisely, rvalue expressions) of type ! trigger divergence, meaning that subsequent code causes the unreachable code warning and the block returns !. This will require some t-lang finessing to define, but it needs to at least be reined in such that place expressions of type ! are not implicated if they are not read, and possibly even if they are read (it is okay to err on the side of not treating a thing as unreachable which turns out to be later), although it is really awkward to be doing this kind of analysis in THIR instead of MIR, one will have to go deep into the patterns to determine whether a value of type ! is produced.

For code that introduces insta-UB, I don't think lints are strong enough. This is essentially arguing "yes we are inserting landmines into your code, but at least we'll tell you about it so don't complain". I feel strongly that we shouldn't be inserting these mines in the first place.

I will reiterate that the surprising behavior here is that it's not always a read, so I don't think this is "inserting" UB into the code at all, it is removing the read UB from some cases. If you write *null you get what you asked for, no one will be surprised by this behavior.

That's one reason we have this core language / surface language split to begin with: to be able to have a small and regular core language that's good for analysis and optimization, without having to inflict programmers with the downsides of such a language.

Programmers have to work with the surface language though, and they have to understand things like coercions if they want to predict the operations that appear in the code. No one writes MIR directly. Most of the time they don't have to care, but depending on the context they may need to pull out the spec and read the fine print. At that point, it will be good to make sure that the fine print isn't any more convoluted than it absolutely has to be, or else people will make mistakes.

I don't have a strong preference for what happens when it is a reference. But I think it makes perfect sense to treat references and raw pointers differently, since they come with very different invariants. We will already (hopefully) do that for handling empty types in match (see https://github.com/rust-lang/rust/issues/117119).

That issue seems very similar to this one, it uses match *x { _ => ... } instead of let _ = *x; but I hope we are leaning toward making those the same (modulo temporary promotion stuff). So I think I might be arguing for the same fix there, and in particular that raw pointers and references should not be treated any differently in this regard, although I'm not exactly sure where your proposal that would make them different is.

Regarding "they come with very different invariants": the thing is, in unsafe code whether you use a reference or a pointer you still have to care about what precise operations happen if you are working with references outside their safety invariant, so all of the issues we have talked about here are problems both for references and for pointers. The details of exactly when UB happens can vary depending on how shallow the validity is and whether &! is considered uninhabited, but no matter how you draw the lines there there will always be some edge case situation isomorphic to what I have constructed above where you can watch coercions and other implicit operations happening.

Nadrieril commented 10 months ago

That issue seems very similar to this one, it uses match x { => ... } instead of let = x;

That's a bit different: there is no NeverToAny in the match case. The issue comes from the fact that match exhaustiveness implicitly always assumed a read (regardless of the type). I'm working on fixing that.

RalfJung commented 10 months ago

I will reiterate that the surprising behavior here is that it's not always a read, so I don't think this is "inserting" UB into the code at all, it is removing the read UB from some cases. If you write *null you get what you asked for, no one will be surprised by this behavior.

We have established that _ patterns do not incur a load. I don't disagree that that is surprising, but they behave like that consistently across a range of systems, including drop order and match handling. Having one particular system (coercions) divert from that is the opposite of "regular".

So no, I disagree with you. The surprising behavior is having _ incur a load in one particular niche situation when everywhere else it observably behaves like not doing a load. Just because the fact that it behaves like that everywhere else is also surprising doesn't mean the one strange exception any less surprising. It's an exception from the exception, and each layer of exceptions is more cause for surprises.

I'm confused. Never-to-any is a coercion, it is documented as a coercion listed among all the others, and it has all the hallmarks of a coercion, being triggered on type errors to make things match up that wouldn't otherwise. I would expect it to only be inserted at coercion sites, just like every other coercion.

It has all the hallmarks of being a coercion except that it also triggers when there's no coercion site. So, not quite like a coercion then.

I agree it could be a coercion, and it might even be documented as one. But I don't see evidence that it is treated as one in rustc. I might be missing something though; maybe there's handling of ! around coercions and there's that other code that marks everything unreachable when it sees a ! expression.

Programmers have to work with the surface language though, and they have to understand things like coercions if they want to predict the operations that appear in the code. No one writes MIR directly. Most of the time they don't have to care, but depending on the context they may need to pull out the spec and read the fine print. At that point, it will be good to make sure that the fine print isn't any more convoluted than it absolutely has to be, or else people will make mistakes.

Yes, agreed. And that's why it is important not to insert landmines into their code while lowering from the surface language to MIR.

I see no justification whatsoever for having a lint detect this situation and then not halt compilation. That lint should be a hard error. At which point we've arrived at my proposal.

That issue seems very similar to this one,

That issue is like the let _ = *x; case, without any type annotation. Nobody in this thread has a problem with how that one behaves. So that issue is disjoint from this one.

I'm not exactly sure where your proposal that would make them different is.

There's a hint of it here; I wrote more about this somewhere but I guess it wasn't that issue... I think it was this IRLO thread.

Regarding "they come with very different invariants": the thing is, in unsafe code whether you use a reference or a pointer you still have to care about what precise operations happen if you are working with references outside their safety invariant, so all of the issues we have talked about here are problems both for references and for pointers. The details of exactly when UB happens can vary depending on how shallow the validity is and whether &! is considered uninhabited, but no matter how you draw the lines there there will always be some edge case situation isomorphic to what I have constructed above where you can watch coercions and other implicit operations happening.

You shouldn't be working with references outside their safety invariant, you should be using raw pointers -- precisely because that gives you more control over the operations that happen automatically. We already don't to autoderef on raw pointers for that exact reason. If we went for maximal regularity between references and raw pointers we'd autoderef raw pointers. Do you think it was a mistake to not autoderef raw pointers?

I'm of the opinion that no autoderef on raw pointers is the right choice, and we should similarly nerf other parts of the surface-to-core lowering to be more cautious around raw pointers: match exhaustiveness checking, and coercions that would introduce loads from the pointer.

traviscross commented 9 months ago

@rustbot labels -I-lang-nominated

We had an extensive discussion of this in the 2023-12-06 T-lang planning meeting last week.

The consensus[^1] was general agreement with @RalfJung:

...that places of type ! should not be subject to never-to-any coercions (the coercion should apply only to values, matching the comment that justifies the coercion in the first place).

Specifically, we felt that it was 1) OK to have a place of ! type but insta-UB to ever coerce that place to a value, and 2) OK to not do never-to-any coercions on places.

Anywhere the compiler can statically determine that insta-UB is happening it would be desirable (but not required) to reject the code, preferably with a deny-by-default lint, but if it's more straightforward to emit a hard error, that's OK too.

Since we're agreeing to not do never-to-any coercions on places, anywhere that such a coercion would be required would now be a hard error due to the resulting type mismatch.

To the degree that changes to the behavior here break uses of the "never type hack", that's probably OK.

[^1]: Some members dropped off the call on the path to this consensus but asserted their trust in those that remained to work out the proper solution.


In the meeting, we settled on the general principles above, but did not go through every example. Following the logic of what we agreed, here's how this author would annotate the examples:

#![feature(never_type)]
use core::ptr::addr_of;
#[cfg(False)]
fn deref_never<T>() {
    unsafe {
        let x = 3u8;
        let x: *const ! = &x as *const u8 as *const _;
        //
        // In all cases, we do not enforce validity invariants on
        // places, and we do not perform never-to-any coercion on
        // places.
        //
        // `addr_of!` does not covert a place to a value.
        let _val = addr_of!(*x); //~ OK.
        //
        // The `_` binding does not convert a place to a value.
        let _ = *x; //~ OK.
        let _: ! = *x; //~ OK.
        //
        // Parentheses do not convert a place to a value.
        let _ = (*x); //~ OK.
        //
        // Since these do not convert a place to a value and
        // never-to-any coercion is not done on places, these do not
        // type check and are hard errors.
        let _: () = *x; //~ ERROR type mismatch
        let _: String = *x; //~ ERROR type mismatch
        let _: T = *x; //~ ERROR type mismatch
        //
        // In all cases, the compiler is free, on a best-effort basis,
        // to reject code that it can prove is insta-UB.  Preferably
        // it would do this as a deny-by-default lint, but it's also
        // OK to emit a hard error if that is much more
        // straightforward to implement.
        //
        // If a place of `!` type is converted to a value, that is
        // always insta-UB.
        //
        // The block expression converts a place to a value.
        let _ = { *x }; //~ Insta-UB.
        //
        // The tuple constructor converts a place to a value.
        let _ = (*x,); //~ Insta-UB.
        let (_,) = (*x,); //~ Insta-UB.
        //
        // Binding to a variable converts a place to a value.  Because
        // of never-to-any coercion on values these type check.
        let _val = *x; //~ Insta-UB.
        let _val: () = *x; //~ Insta-UB.
        let _val: String = *x; //~ Insta-UB.
        let _val: T = *x; //~ Insta-UB.
        //
        // Even if the resulting type of the value is explicitly `!`,
        // it's still insta-UB to convert a place of `!` type to a
        // value, since a value of type `!` is always insta-UB in
        // Rust.
        let _val: ! = *x; //~ Insta-UB.
        let _: ! = { *x }; //~ Insta-UB.
        let (_,): (!,) = (*x,); //~ Insta-UB.
    }
}

(Thanks to @WaffleLapkin for joining the meeting and for helpful discussions on this topic and to @scottmcm for staying on to the bitter end of the call to hammer out the details.)

RalfJung commented 9 months ago

@traviscross thanks for the detailed summary!

So... looks like the next step would be to actually implement these semantics. Since this occurs on the HIR layer I'm somewhat out of my depth here. Who are the HIR experts that we can ping to ask for help here? :)

LunarLambda commented 3 months ago

Why did this get marked "requires-nightly"? It reproduces on 1.79.0.

Also, I found a particularly funny version where codegen places the ud2 at the end of the function, allowing fn main to continue past the addr_of, then explode:

https://godbolt.org/z/fhdfK6jT1

trait Type { type T; }
impl<T> Type for fn() -> T { type T = T; }

// Make `!` nameable on stable
type Never = <fn() -> ! as Type>::T;

pub fn main() {
     let x = &();
     let y = (x as *const ()).cast::<Never>();

    // Should be sound (doesn't produce a value of `!`, only a place, so does not diverge)
    let z = unsafe { std::ptr::addr_of!(*y) };

    println!("huh: {z:p}");
}

If you change let z = unsafe { std::ptr::addr_of(*y) }; to let _ = unsafe { *y }; the println! gets destroyed. let _ = unsafe { str::ptr::addr_of!(*y) }; also keeps the print. So compilation differs between addr_of!(*y) and *y, even when assigning to _.

https://github.com/rust-lang/unsafe-code-guidelines/issues/261#issuecomment-1784628422 Seems to suggest that let _ = *x; and let _ = addr_of!(*x); should be semantically identical, but they currently compile to very different MIR.

WaffleLapkin commented 3 months ago

I'd call <fn() -> ! as Type>::T a hack to purposefully avoid stability checks. While you can use this hack on stable I think that the "requires nightly" label is correct.

(in the same way as using RUSTC_BPOTSTRAP to avoid the checks does not mean that the issue does not require nightly)

asquared31415 commented 3 months ago

isn't an empty enum equivalent to ! for the purposes of this place discussion?

LunarLambda commented 3 months ago

No this bug is specific to !. Equivalent code works fine for empty enums

RalfJung commented 1 month ago

This is now blocking https://github.com/rust-lang/rust/pull/129248: making addr_of!(*ptr) a safe operation.

compiler-errors commented 1 month ago

Well, I can put up a hack to unblock that specifically. As @Nadrieril pointed out, this is actually two different "bugs" in HIR typeck:

#![feature(never_type)]
fn main() {
    unsafe {
        let x = 3u8;
        let x: *const ! = &x as *const u8 as *const _;
        let _ /* (1.) */ = *x;
        /* (2.) */
    }
}

In the above code:

  1. At (1.), we actually do never type fallback to (). Coercing the ! to () does constitute a read, and therefore is UB.
  2. At (2.), HIR typeck considers the fact that *x evalauted to ! to mean that all following code diverges, and therefore counts the block's type as ! as well.

Only (2.) actually matters for #129248, and if we were to land that PR right now, it would be unsound:

#![feature(never_type)]

fn make_up_a_value<T>() -> T {
    let x: *const ! = 0 as _;
    &raw const *x;
    // Since `*x` is `!`, HIR typeck thinks it diverges and allows the block to coerce to any value.
}

I will probably just put up a hack for (2.), since it's basically the only expression[^1] where this matters. This doesn't fix the let _: ! = *x; case, but that should remain unsafe at least, and therefore just regular UB and not unsound.

[^1]: Other combinations of HIR, such as match scrutinees that have no reads should also ideally be fixed... but that's really hard to fix I fear.

RalfJung commented 1 month ago

I would say it is unsound in the sense that the reference doesn't say that this is UB. Admittedly the reference isn't clear on the entire point about let _ only evaluating the place expression, but that's what we've been consistently communicating for a while now and it has informed other decisions as well (around match, for instance).

But fixing one of two bugs is still progress, so that'd still be a great first step. :)