rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
99.18k stars 12.81k forks source link

Tracking Issue for Never Patterns (`never_patterns`) #118155

Open Nadrieril opened 1 year ago

Nadrieril commented 1 year ago

This is a tracking issue for the experimental "never patterns" feature (as per the T-lang experimental feature process). The feature gate for the issue is #![feature(never_patterns)].

RFC: https://github.com/rust-lang/rfcs/pull/3719

This feature introduces a new syntax for patterns: !. In a pattern, ! is valid for any uninhabited type and simply acknowledges that the type is empty. Because a never pattern is statically known to be unreachable, it obeys slightly different rules:

enum Void {}
let result: *const Result<u32, Void> = ...;
unsafe {
    match *result {
        Ok(x) => { foo(x) }
        Err(!), // no need for `=> <expr>`
    }
    let (Ok(x) | Err(!)) = *result; // valid
}

fn blah(!: Void) -> SomeType {} // the function can never be called so it doesn't need a body

This feature does not affect exhaustiveness or pattern reachability; see the exhaustive_patterns and min_exhaustive_patterns features for that.

About tracking issues

Tracking issues are used to record the overall progress of implementation. They are also used as hubs connecting to other relevant issues, e.g., bugs or open design questions. A tracking issue is however not meant for large scale discussion, questions, or bug reports about a feature. Instead, open a dedicated issue for the specific matter and add the relevant feature gate label.

Steps

Unresolved Questions

### Implementation plans/details
- [x] `!` is parsed as a pattern.
- [x] Parsing of `match` expressions allows `match <expr> { <pat>, ... }` i.e. a pattern without a body. We later check to only allow that if the pattern is a never pattern.
- [ ] The lack of body is desugared into something equivalent to `unreachable_unchecked!()`.
- [x] Never patterns skip the or-pattern bindings check, to allow `Ok(x) | Err(!)`
- [x] Never patterns don't allow bindings, like in `(x, !)`
- [x] Allow `fn foo(!: Void) -> SomeType {}`
- [x] `!` is typechecked like a wildcard; it adds no type constraint
- [x] In match checking we check that `!` is only used on an uninhabited type (modulo match ergonomics).
- [ ] The `!` pattern is lowered to MIR.
- [ ] A `!` pattern asserts validiy of the matched place
- [x] Exhaustiveness understands `!`.
- [x] Exhaustiveness recommends using never patterns where sensible.
- [ ] Rustfmt understands never patterns
- [ ] Check every bit of the compiler that handles or-patterns to see if it handles never patterns properly
- [x] Write up the RFC
- [ ] https://github.com/rust-lang/rust/issues/120240
### Implementation history
- [ ] https://github.com/rust-lang/rust/pull/118157
- [ ] #118527
- [ ] https://github.com/rust-lang/rust/pull/118868
- [ ] https://github.com/rust-lang/rust/pull/119610
- [ ] https://github.com/rust-lang/rust/pull/119622
- [ ] https://github.com/rust-lang/rust/pull/120009
- [ ] https://github.com/rust-lang/rust/pull/120097
- [ ] https://github.com/rust-lang/rust/pull/120104
- [ ] https://github.com/rust-lang/rust/pull/120517
- [ ] https://github.com/rust-lang/rust/pull/120758
- [ ] https://github.com/rust-lang/rust/pull/121391
- [ ] https://github.com/rust-lang/rust/pull/121823
- [ ] https://github.com/rust-lang/rust/pull/123332
RalfJung commented 1 year ago

The "steps" should include "write an RFC and have it accepted", no?

Nadrieril commented 1 year ago

Indeed it should

Alonely0 commented 12 months ago
enum Void {}
unsafe {
   let ptr: *const Result<u32, Void> = ...;
   match *ptr {
       Ok(x) => { ... }
       Err(_) => { ... } // potentially reachable if `ptr` points to uninitialized data
   }
   match *ptr {
       Ok(x) => { ... }
       Err(!) // indicate explicitly that we ensured the empty case couldn't happen
   }
}

The first match is UB. First, in Rust, any reads on uninitialized memory are considered undefined behavior; and second, if the Err variant cannot be constructed, it can't ever exist. Thus, the check for it can (and should) be removed by the optimizer, wiping away any matching on the Err variant and the code associated with it. Basically, the match would always execute the Ok branch regardless of the existence of valid data on the inner, which is exactly what you're not trying to do there (it would always get optimized down to the second match, which is equivalent to the ones proposed that are not unsafe). As a follow-up, it is not even marked as unsafe, there are many things that can go wrong when reading from a random raw pointer, not just the discriminant of an enum... Has this gone through a pre-RFC of some sort?

I think the fact that we do not want to touch uninitialized memory directly (w/o MaybeUninit) in the language with a 10ft pole is pretty well set in stone, so I'd just remove this part of the proposal and rename it "Allow elision of unreachable patterns".

Nadrieril commented 12 months ago

Hi! I thought like you not that long ago, until I digged into the subtleties of rust operational semantics. Allow me to clarify.

in Rust, any reads on uninitialized memory are considered undefined behavior

This is correct, but that match does not read from the value in the Err case, since a _ pattern does not cause a read.

if the Err variant cannot be constructed, it can't ever exist

While this is a likely possibility for rust opsem to decide, this has not been decided yet. Therefore I am erring on the side of assuming it is possible to read Err with uninitialized data. This is discussed here.

it is not even marked as unsafe

I'm confused, there's an unsafe block in the code you quote. Dereferencing a pointer always requires an unsafe block.

Has this gone through a pre-RFC of some sort?

No, this is an experimental feature gate, meant to iron out the details of the feature before we write the RFC. The lang-team took a quick look at my proposal and approved the experiment. I have discussed details of rust opsem with @RalfJung for a while before I felt confident proposing this. While I believe my approach is sound, this will have to be fully settled in the (to-be-written) RFC.

I think the fact that we do not want to touch uninitialized memory directly (w/o MaybeUninit) in the language with a 10ft pole is pretty well set in stone

This isn't quite correct. We need the possibility of uninitialized data behind a pointer. In fact to initialize a MaybeUninit<T> you have to go through as_mut_ptr which returns a *mut T pointing to possibly-uninitialized memory.

The mem::uninitialized debacle that led to the creation of MaybeUninit wasn't about uninitialized memory in general, it was about the fact that the function would return an uninitialized value, which is indeed instant UB. As long as the uninitialized data is not read (e.g. because it is behind a pointer), we're ok.

I'd just remove this part of the proposal and rename it "Allow elision of unreachable patterns".

This is the opposite of what this proposal is about. "Allow elision of unreachable patterns" is implemented in the exhaustive_patterns feature gate, which hasn't been stabilized because it is unsound, exactly because it allows elision of patterns that are reachable without UB in rust's current operational semantics. The whole point of this proposal is to fix that soundness whole.

RalfJung commented 12 months ago

in Rust, any reads on uninitialized memory are considered undefined behavior

This is not correct. Please consult the reference for what is and is not UB. Here is an example of code that reads uninit memory (for some definition of "reads") and has no UB.

You need to be careful with blanket statements like that.


Miri's current implementation means that reading the discriminant of a Result<u32, Void> can indeed never yield Err. The question is:

Never Patterns help because they let us make explicit when we want to exploit that a type is empty. We can then factor the exhaustive_patterns discussion into "how do Never Patterns work" and "when do we implicitly insert Never Patterns", i.e., when can they be elided.

For the case of Result<u32, Void>, given that there is an explicit Ok arm that asks for the discriminant to be read, I could imagine that we will decide "yeah okay we can add Err(!) implicitly". But maybe we don't want to do that when there are raw pointers involved, just to be extra safe.

RalfJung commented 12 months ago

The summary of this issue doesn't entirely capture this factoring I described though, it could be improved. Currently it mixes up the concept of Never Patterns with the concept of eliding Never Patterns, while I think these concepts should be discussed completely separately. In that sense gathering feedback on a separate writeup might be helpful. The experimental process means we don't have to wait until the RFC is all set and done, but it can still help to have the RFC at least sketched so that key stakeholders can even understand what the experimental change is doing without reverse engineering that from the implementation. It can also serve as a guideline to the reviewer.

The first example in this issue does not use a never pattern, making it a strange example for explaining never patterns.

Nadrieril commented 12 months ago

This is not correct.

Oh well, every time I think I get it I discover I'm missing something. Thx for pointing it out.

Even if we do, do we want the UB to come about implicitly by omitting match arms?

Currently it mixes up the concept of Never Patterns with the concept of eliding Never Patterns

No, my proposal is different. I'm intentionally moving away from this "implicit/explicit" frame that was in Niko's original never patterns blog post. Maybe this will turn out to be mistaken, but my current stance is: if a branch is reachable without UB, then it can't be omitted. That would be exhaustiveness being unsound. That's it, no elided never patterns or such things. Then I add the never patterns syntax as a simpler alternative to Err(x) => match x {}.

I find that a lot easier to understand and explain than a contextual desugaring. The drawback is that exhaustiveness now depends on the source of the scrutinee place and not just its type.

it can still help to have the RFC at least sketched

I linked this sketch in the OP, have you read it?

The first example in this issue does not use a never pattern, making it a strange example for explaining never patterns.

Well yeah, that's because this proposal has two parts: there's a refinement of exhaustiveness checking, and there's the never patterns syntax. I do intend to split off the exhaustiveness part into its own feature that we might be able to stabilize sooner.

Alonely0 commented 12 months ago

in Rust, any reads on uninitialized memory are considered undefined behavior

This is not correct. Please consult the reference for what is and is not UB. Here is an example of code that reads uninit memory (for some definition of "reads") and has no UB.

You need to be careful with blanket statements like that.

I consulted the Rustonomicon, sorry if I was misinformed. https://doc.rust-lang.org/nomicon/uninitialized.html

EDIT: It seems that in this context (enums) I was right, but my statement without it wasn't, thanks for pointing that out. I wasn't aware of the following:

Note: Uninitialized memory is also implicitly invalid for any type that has a restricted set of valid values. In other words, the only cases in which reading uninitialized memory is permitted are inside unions and in "padding" (the gaps between the fields/elements of a type).

Alonely0 commented 12 months ago

Hi! I thought like you not that long ago, until I digged into the subtleties of rust operational semantics. Allow me to clarify.

in Rust, any reads on uninitialized memory are considered undefined behavior

This is correct, but that match does not read from the value in the Err case, since a _ pattern does not cause a read.

But how do you know that it is an Err? Because you read the discriminant, which is either valid (Ok/Err) or invalid and UB to operate with it. You can't say "it is Err because it was not Ok", you're assuming something the optimizer hasn't yet and might not do. If, for example, it decides to use jump tables, you could be jumping to dead code for all you know. The unsafe thus should mean that at least, you ensure that the enum has a valid representation, and if it has it, it still has to be only Ok because the Err is uninhabited (more on that on the next paragraph).

if the Err variant cannot be constructed, it can't ever exist

While this is a likely possibility for rust opsem to decide, this has not been decided yet. Therefore I am erring on the side of assuming it is possible to read Err with uninitialized data. This is discussed here.

But that post is about the layout and how we manage it internally, and I'm talking about the semantics of uninhabited (never) types. According to the reference, casting uninhabited (never) types out of thin air is unsound, and that's the same as having the discriminant be Err on that value, you're saying that whatever is next (in this case, zero bytes), is a constructed uninhabited type, which is equivalent to a singularity. The Err variant is an invalid (ergo unsound) state of the program. So you have to extend the meaning of that unsafe statement to mean "I'm sure it is an Ok", but it appears to me that's exactly the opposite of what you're trying to do...

it is not even marked as unsafe

I'm confused, there's an unsafe block in the code you quote. Dereferencing a pointer always requires an unsafe block.

Now I see it, I was looking for it, but for some reason I couldn't find it yesterday, sorry for that, mea culpa.

Has this gone through a pre-RFC of some sort?

No, this is an experimental feature gate, meant to iron out the details of the feature before we write the RFC. The lang-team took a quick look at my proposal and approved the experiment. I have discussed details of rust opsem with @RalfJung for a while before I felt confident proposing this. While I believe my approach is sound, this will have to be fully settled in the (to-be-written) RFC.

I think the fact that we do not want to touch uninitialized memory directly (w/o MaybeUninit) in the language with a 10ft pole is pretty well set in stone

This isn't quite correct. We need the possibility of uninitialized data behind a pointer. In fact to initialize a MaybeUninit<T> you have to go through as_mut_ptr which returns a *mut T pointing to possibly-uninitialized memory.

Indeed, raw pointers indeed can point to uninitialized data, but that was not my point. My point is that you're reading a value that you don't know if it is initialized and in a valid state (the discriminant), and you're checking that validity using semantics that assume the value is in a valid state, so the checks may get removed altogether.

The mem::uninitialized debacle that led to the creation of MaybeUninit wasn't about uninitialized memory in general, it was about the fact that the function would return an uninitialized value, which is indeed instant UB. As long as the uninitialized data is not read (e.g. because it is behind a pointer), we're ok.

I'd just remove this part of the proposal and rename it "Allow elision of unreachable patterns".

This is the opposite of what this proposal is about. "Allow elision of unreachable patterns" is implemented in the exhaustive_patterns feature gate, which hasn't been stabilized because it is unsound, exactly because it allows elision of patterns that are reachable without UB in rust's current operational semantics. The whole point of this proposal is to fix that soundness whole.

Oh, great then.

Nadrieril commented 12 months ago

I feel like we're talking a bit past each other, I ask your patience as we try to find what we're disagreeing on.

casting uninhabited (never) types out of thin air is unsound, and that's the same as having the discriminant be Err on that value

That doesn't seem correct. First I can very well make a ptr: *const Result<T, !>. And until opsem rules it out, I could find a way to write to ptr in a way that initializes the discriminant to Err. So far, no uninhabited type has ever been read so we're good. Then if I read the discriminant, I get Err which is still good. The case that would be UB is if I then read the data part of the enum.

Note the very important fact that all of this happens behind a pointer. If it was instead a plain let x: Result<T, !> = ...;, then x must be valid, which indeed forbids the Err case.

Also I'm not sure what we're arguing about because it is already the case on stable rust that the Err(_) case must be included. This proposal doesn't change that.

You can't say "it is Err because it was not Ok", you're assuming something the optimizer hasn't yet and might not do. If, for example, it decides to use jump tables, you could be jumping to dead code for all you know.

That is not what I'm doing. If the match expression was compiled that way I'm pretty sure it would be a bug. What the Err(_) case means is: "if we read the discriminant, and it returns Err, take this branch". If it turns out that it's UB to construct the Err case, then that's fine, the discriminant read will always return Ok and the Err branch will just never be taken.

If on the other hand I allowed match ... { Ok(_) => {} } to be exhaustive, then if Err is ever read we have UB. That would mean match exhaustiveness is unsound! I insist that requiring a branch for Err is the safer alternative. Unless that's not what you were arguing?

Thus, the check for it can (and should) be removed by the optimizer, wiping away any matching on the Err variant and the code associated with it

Until opsem decides for sure that Err is not constructible, then the optimized MUSN'T remove this case. Today's rust doesn't optimize this away, and this proposal does not change that. Moreover, if this optimization was ever correct, then what's the problem? The optimizer optimized away some dead code, I don't see how that causes any UB. I'm saying the same thing as above again, but I want to be sure we're on the same page.

Nadrieril commented 12 months ago

Maybe I should be clearer about what's currently required on stable rust. In both stable rust and under the proposal, the first match below is accepted, and the second match below gives a "non-exhaustive" error.

enum Void {}
unsafe {
    let ptr: *const Result<u32, Void> = ...;
    match *ptr {
        Ok(x) => { ... }
        Err(_) => { ... }
    }
    match *ptr { // ERROR: non-exhaustive: `Err` not covered
        Ok(x) => { ... }
    }
}

As long as we're behind a pointer/reference/union, there's no changes compared to stable. Where it does change is in the other cases, e.g. a plain value. In stable rust, we get:

enum Void {}
let val: Result<u32, Void> = ...;
match val { // accepted
    Ok(x) => ...
    Err(_) => ...
}
match val { // ERROR: non-exhaustive: `Err` not covered
    Ok(x) => ...
}

With never_patterns, we get:

enum Void {}
let val: Result<u32, Void> = ...;
match val {
    Ok(x) => ...
    Err(_) => ... // WARN: unreachable
}
match val { // accepted
    Ok(x) => ...
}
Alonely0 commented 11 months ago

That doesn't seem correct. First I can very well make a ptr: *const Result<T, !>. And until opsem rules it out, I could find a way to write to ptr in a way that initializes the discriminant to Err. So far, no uninhabited type has ever been read so we're good. Then if I read the discriminant, I get Err which is still good. The case that would be UB is if I then read the data part of the enum.

But you are reading it! That's where our disagreement comes from, and it's not even the root of it. You're reading it when you dereference the pointer, because you're not reborrowing it (you then discard/drop it later with _, but that's irrelevant). The thing is, when the compiler & the optimizer see the Err variant, they see that in order for the Err to exist, a never type had to be created beforehand, because it is a dependency of Err (unlike C, where everything is wrapped in a MaybeUninit). The semantics of the never type are simple, nothing can exist after it is created; but, however, if we go back to the definition of the Err variant, the never type had to exist before, so the Err exists after the never type! This violates the semantics of the never type, so the path where the Err variant matches is unsound; Q.E.D. You can indeed write the ptr so that the data is invalid, but then even thinking about it as that enum is unsound (and even if unused, unused data must also be valid). That's proof by contradiction, this is proof by miri (the playground is down at the time of writing, so I'll just paste here the code):

fn main() {
    let mut x = Foo::<i32, Uninhabited>::X(1);
    assert_eq!(std::mem::size_of_val(&x), std::mem::size_of::<i32>() * 2); // Uninhabited is a ZST.
    assert_eq!(std::mem::size_of::<()>(), std::mem::size_of::<Uninhabited>()); // () is also a ZST.
    let mut y = Foo::<i32, ()>::X(1);
    assert_eq!(x, unsafe { std::mem::transmute(y) }); // Ergo, reprs are the same.
    y = Foo::Y(());
    let mut x = unsafe { std::mem::transmute::<_, FooLayout<i32, Uninhabited>>(x) };
    x.tag = 1; // Y's tag.
    let x = unsafe { std::mem::transmute::<_, Foo<i32, Uninhabited>>(x) };
}

#[derive(Copy, Clone, Debug, PartialEq)]
enum Uninhabited {}

#[repr(C, i32)]
#[derive(Debug, PartialEq)]
enum Foo<A, B> where A: Copy, B: Copy {
    X(A) = 0,
    Y(B) = 1,
}

#[repr(C)]
union UntaggedFoo<A, B> where A: Copy, B: Copy {
    x: A,
    y: B,
}

#[repr(C)]
struct FooLayout<A, B> where A: Copy, B: Copy {
    tag: i32,
    union: UntaggedFoo<A, B>,
}

When run with miri, notice that the write is, indeed, valid, but even if unused or unread, thinking about it as a Foo is already UB. On your initial piece of code you're checking if the discriminant is valid, and if it is not, you execute some code. However, when you do that, the potentially invalid data escapes the raw pointer, which is unsound.

Nadrieril commented 11 months ago

I do not understand your explanation. Anyway, what do you concretely propose we should do differently?

Nadrieril commented 11 months ago

I propose to split off the exhaustiveness-affecting changes into their own feature gate: https://github.com/rust-lang/rust/pull/118803. If that gets accepted, then never_patterns will become exclusively about ! patterns (which makes a lot of sense).

Alonely0 commented 11 months ago

I do not understand your explanation. Anyway, what do you concretely propose we should do differently?

First, define what you want the match to semantically mean in min_exhaustive_patterns. In current semantics, the unsafe block in your example piece of code would have to uphold the invariant of the pointee being a valid construction of the type (valid discriminant, and not a discriminant of an uninhabited variant). min_exhaustive_patterns would be checking whether the type is valid, but it would be doing it after the compiler has been ensured that it is. So, what I think min_exhaustive_patterns intents to do is change the semantics of both a dereference and a match expression when doing a very specific thing in a very specific context, and it feels very off to me... my advice is just don't distinguish between safe and unsafe, we already have ways to check for the validity of types, this is just tangling stuff up. The safe part is very nice, though; with it, your example of it would translate 1:1 to the unsafe counterpart (ignoring the unsafe dereference), and if the type is not valid, your job was to check that before you dereferenced, not after the fact!

To me, it feels like you're looking for an unsafe match of some sort, that throws away all semantics and just checks the actual bytes for equality. It's just that it is already a thing, you just have to semantically express that; i.e., instead of dereferencing the maybe invalid type, dereference the known-to-be-valid discriminant, and match on it instead. If you're doing stuff so unsafe you don't even know whether your types are valid, this is how you have to sanitize them: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=f2a105879baa6471c11e5d0e44bc2c6c

Nadrieril commented 11 months ago

Oh, if you think min_exhaustive_patterns is good, then we're on the same page! I intend never_patterns to only be a bit of syntactic sugar on top of that. I'll clarify that when min_exhaustive_patterns gets merged, in case you still have doubts.

Alonely0 commented 11 months ago

Oh, if you think min_exhaustive_patterns is good, then we're on the same page! I intend never_patterns to only be a bit of syntactic sugar on top of that. I'll clarify that when min_exhaustive_patterns gets merged, in case you still have doubts.

Sorry, I only read half of it, I thought it was the first part of your proposal (eliding semantically unreachable patterns). min_exhaustive_patterns suffers from the same exact semantic problem as never_patterns. I think I explained it well (after a few edits, tho) in my previous reply, if you still don't understand why it is all unsound we can jump on a DM, a call, or whatever, and I can explain it for you.

Nadrieril commented 11 months ago

Thank you for this offer. For now I'll wait to see if anyone else comes up to either explain your point in a way that I understand or to confirm that my idea is sound.

Alonely0 commented 11 months ago

If you want more reach, post a small & conceptual (like what you wrote on the first post) pre-RFC in the internals forums, where people far above my pay grade will (might) review it. The only way somebody is going to ever find this random GitHub thread is by mistake.

RalfJung commented 11 months ago

@Nadrieril

No, my proposal is different. I'm intentionally moving away from this "implicit/explicit" frame that was in Niko's original never patterns blog post. Maybe this will turn out to be mistaken, but my current stance is: if a branch is reachable without UB, then it can't be omitted. That would be exhaustiveness being unsound. That's it, no elided never patterns or such things. Then I add the never patterns syntax as a simpler alternative to Err(x) => match x {}.

But if an arm is unreachable without UB, then it can be omitted. So there's still implicit never patterns, e.g. in match (x: Void) {}, which I would say is sugar for match (x : Void) { ! }. That may not be how you plan to implement this, but conceptually it is a lot simpler IMO than having two completely separate mechanisms for matching on uninhabited types.

I linked this sketch in the OP, have you read it?

"You don't have permission to access this resource. "

@Alonely0

I consulted the Rustonomicon, sorry if I was misinformed.

Arguably that is imprecise when it says "Attempting to interpret this memory as a value of any type will cause Undefined Behavior". I filed an issue. Thanks for pointing that out!

But you are reading it! That's where our disagreement comes from, and it's not even the root of it. You're reading it when you dereference the pointer, because you're not reborrowing it (you then discard/drop it later with _, but that's irrelevant).

No. GetDiscriminant on a place is not the same as loading a value from that place. GetDiscriminant just reads the discriminant, not any other part of the place. So it is perfectly conceivable that GetDiscriminant might return Err even if that variant is impossible to construct. I think eventually we will declare that this can never happen, but for now this is an open question.

Nadrieril commented 11 months ago

"You don't have permission to access this resource. "

Oh shoot! Fixed it now.

But if an arm is unreachable without UB, then it can be omitted. So there's still implicit never patterns, e.g. in match (x: Void) {}, which I would say is sugar for match (x : Void) { ! }. That may not be how you plan to implement this, but conceptually it is a lot simpler IMO than having two completely separate mechanisms for matching on uninhabited types.

What "two completely separate mechanisms for matching on uninhabited types" do you mean?

RalfJung commented 11 months ago

What "two completely separate mechanisms for matching on uninhabited types" do you mean?

There's two mechanisms that both analyze whether the match subject is uninhabited: checking whether a ! pattern is accepted, and checking whether we can omit a match arm.

RalfJung commented 11 months ago

Oh shoot! Fixed it now.

Ah, now I can see it. :) I left some comments.

joshtriplett commented 11 months ago

Review in the style team turned up an issue that I'd like to raise with a lang hat on:

Syntactically, the idea of never patterns like Err(!) not having any trailing delimiter seems potentially problematic for both humans and computers to parse, if never patterns appear somewhere other than the last match arm.

Consider:

match expr {
    Variant1(field) => { ... }
    Variant2(!)
    Variant3(field) => { ... }
}

Even worse, consider leading |:

match expr {
    | Variant1(field) => { ... }
    | Variant2(!)
    | Variant3(field) => { ... }
}

I think we should have a mandatory trailing , on this.

(In theory, syntactically, we don't need the trailing comma if it's the last match arm. In practice from a style perspective and for consistency with all our other uses of delimiters, we'd probably want it.)

Nadrieril commented 11 months ago

That's how I implemented the parsing 👍 I made a comma required everywhere, except for last pattern of the match where it's optional, like we do for trailing commas in general.

RalfJung commented 11 months ago

@RalfJung you did? On this document right? I don't see any comments, where should I be looking?

That comment disappeared. So I guess you found them? :)

Nadrieril commented 11 months ago

I didn't but I realized hackmd was overall very buggy y'day for me so that was probably the cause

RalfJung commented 11 months ago

There should be a button for "comments" at the top:

image

If you click that it shows all the comments; you can click the comment to scroll to the place in the document it refers to.

Nadrieril commented 11 months ago

I see it! And it definitely wasn't there yesterday x)

Alonely0 commented 11 months ago

@Nadrieril does this semantic flowchart help you understand my point? I'm sorry I couldn't reply earlier, I've been busy. image In the eyes of the compiler, undefined behavior is not reachable (exactly like a never type), so it uses that axiom as a way of performing optimizations. As it cannot exist, any branches that would statically trigger it are considered always unreachable. Consequently, the compiler would remove both branches and always execute the Ok code path, even if the discriminant is not Ok (which we're expressing we're not sure), and thus we might be operating with an invalid inner in the case of an Err. What you want to do is match the discriminant without reading the whole enum, so that the Err branch is not uninhabited (and thus, not removed), and then only and manually read the inner in the Ok branch (yes, you're reading it in the Err branch; in fact, you did in the dereference, the Err line just moved it). This, of course, also applies to the ! syntactic sugar. Proof: https://godbolt.org/z/a1dhzncqa

Proposed solution: just build a crate that exports something like a match_unknown_validity! macro and some attribute macros with the behavior you intend to get out of this (hint: use the code I linked a few posts ago as a reference). It wouldn't require almost any changes ever, so if it gets very popular, you could make an RFC to T-Libs for getting it into core (like with OnceCell) or to T-Lang for exploring the idea directly in the language. Changing the semantics of match in this specific case, and just for this, is incredibly silly. As I said earlier, you don't want to match the enum, you want to match the discriminant (the enum has values that might not be valid, but the discriminant always is (provided it is in the range of those representing the enum's branches, so you would always need a catch-all branch)).

What I meant when I said that it was a great idea, was being able to remove the Err branch (regardless of the unsafe) because it is statically unreachable, so a let deestructuring pattern (not let else, just let) would do the job. I think that is worth pursuing, so I encourage you to pivot your proposal into that, and leave what you meant with the unsafe part to that macro until we see if only-discriminant matching is worth pursuing at all in the language.

Nadrieril commented 11 months ago

You went through a lot of trouble for this x). Look, I'm tired of this misunderstanding. Only-discriminant matching is what rust does today. I can't find a reference for this claim sorry, maybe Ralf's comment can convince you. But my proposal does not even rely on that! It's completely orthogonal! I'm

Do you have a problem with any of that? I think what you're talking about is off-topic here, and we should continue this conversation elsewhere.

Alonely0 commented 11 months ago

You went through a lot of trouble for this x)

Autism is both a superpower and a curse.

Only-discriminant matching is what rust does today

Rust, indeed, uses the discriminant to match the enum, but semantically matches the enum. If Rust used only-discriminant semantics (what you're looking for), these two pieces of code would compile down to be the same: https://godbolt.org/z/xc8hW91hT

Do you have a problem with any of that? I think what you're talking about is off-topic here, and we should continue this conversation elsewhere.

This is not off-topic, I'm discussing why having different behavior in an unsafe context "just in case it is invalid" is nonsense. Unsafe is not even supposed to change behavior, it is supposed to enable it! Unsafe blocks must be transparent to safe expressions, and allow unsafe ones. match is always safe, whatever you do to get to the value you're matching against, or whatever you do inside the branches, that's on you; but match in and of itself is always safe, period.

RalfJung commented 11 months ago

@Alonely0 you seem to assume that the semantics of the MIR operation to read the discriminant is "read entire enum value, then extract discriminant". That is not the case, and that seems to be the root of this disagreement.

There is a MIR operation that is given a place (that is, a pointer to some location in memory, where raw bytes are stored), and what it does is determine the discriminant based on that. At no point does this operation involve constructing a value of enum type. It is, currently, legal to do this even if that place does not contain a valid value of the enum type. See https://github.com/rust-lang/rust/issues/91095 for further discussion of this. You may say that you'd prefer a different semantics, but these are the semantics we have, and if you want to change them then please start a suitable discussion and give proper motivation -- but not in this issue.

Therefore, the step in your diagram that says "read ptr as value of Result<...>" is wrong, such a step never happens. If you are confused about this, maybe IRLO or Zulip are better places to resolve that misunderstanding, rather than completely derailing this issue.

Alonely0 commented 11 months ago

you seem to assume that the semantics of the MIR operation to read the discriminant is "read entire enum value, then extract discriminant". That is not the case, and that seems to be the root of this disagreement.

Not at all, I mean the unsafe dereference. When you dereference the pointer, you either reborrow to get a reference to a type that must be valid, or copy it to local memory as a type that must be valid (which may or may not be optimized away). I didn't mean the read on the discriminant that match does, I meant the operation of the dereference. There you already promise that the enum is valid, and then the compiler then uses this information to alter the codegen of match to remove the Err code path. Thus, Err(!) is nonsense, as such a thing was already promised in the dereference inside the match line.

RalfJung commented 11 months ago

When you dereference the pointer, you either reborrow to get a reference to a type that must be valid, or copy it to local memory as a type that must be valid (which may or may not be optimized away).

No. As I have explained already two times, there is another operation you can do on the place *ptr: you can read its discriminant. This does not involve copying its value to local memory at any type. You are just fundamentally misunderstanding how the "get discriminant" operation works. Or you are misunderstanding the concept of places, I am not sure. But you are being repeatedly told that you are drawing conclusions based on false information, and yet you insist, so it doesn't make a lot of sense to keep going.

I meant the operation of the dereference. There you already promise that the enum is valid,

The *ptr operation itself turns a pointer-typed value into a place. This operation does not access memory, and is never UB. It doesn't do anything except "cast the pointer into a different light". See e.g. here for a basic explanation of places vs values.

Dereferencing absolutely does not promise that the enum is valid. This promise only happens when doing a load from the place, i.e., at the place-to-value coercion. In the match expressions we are discussing here, there is no place-to-value coercion.

This is the last time I am trying to explain this in this issue, future comments will be marked as "off topic". Please use other channels to learn about the basics of Rust semantics before trying to explain those semantics to the people that are working on defining them.

Alonely0 commented 11 months ago

Fine, I'll see myself out. I apologize if, while trying to contribute, I became a hassle.

RalfJung commented 11 months ago

I understand you had good intentions. :) But these discussions are hard, and if you're coming in with a wrong mental model of how Rust works, we have to disentangle a mess. The deref operation *ptr is one of the more subtle expressions in Rust due to the place/value dichotomy. On Zulip and IRLO there is enough space to discuss the questions you may have without distracting from the core of a tracking issue like here.

Nadrieril commented 10 months ago

Hey sorry, I quote:

A tracking issue is however not meant for large scale discussion, questions, or bug reports about a feature. Instead, open a dedicated issue for the specific matter and add the relevant feature gate label.

Please open a dedicated issue or come ask your question on Zulip.

dead-claudia commented 10 months ago

@Nadrieril My bad. Deleted the relevant comments.

Nadrieril commented 10 months ago

@dead-claudia No worries, we don't enforce this policy very consistently 🙏

Nadrieril commented 10 months ago

I have split off the exhaustiveness-affecting parts of this feature to min_exhaustive_patterns. This feature is now strictly about adding the ! pattern.

Nadrieril commented 1 month ago

RFC is up: https://github.com/rust-lang/rfcs/pull/3719