rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
96.29k stars 12.45k forks source link

tracking issue for default binding modes in match (RFC 2005, match_default_bindings) #42640

Closed nikomatsakis closed 5 years ago

nikomatsakis commented 7 years ago

This is a tracking issue for the "match ergonomics using default bindings mode" RFC (rust-lang/rfcs#2005).

Status: Awaiting stabilization PR and docs PR! Mentoring instructions here.

Steps:

Unresolved questions:

nikomatsakis commented 7 years ago

I'm actually not 100% sure the best way to implement this. It seems like we will need to add adjustments to patterns, and then integrate those into the various bits of the compiler. @eddyb, have any thoughts on that? You usually have some clever ideas when it comes to this sort of thing. =)

eddyb commented 7 years ago

We should be able to treat adjustments more uniformly now, although it'd still be a lot of plumbing.

crazymykl commented 7 years ago

How does this interact with slice patterns (#23121)?

nikomatsakis commented 7 years ago

@crazymykl

My expectation would be that, when we encounter a pattern like [a, ..b], we check the default binding mode. If it is "ref", then a becomes &slice[0] and b becomes (effectively) &slice[1..].

nikomatsakis commented 7 years ago

@eddyb do we want to use general purpose adjustments here, or something more limited (the current RFC, after all, only requires autoderef of & and &mut types). It seems like though if we can use adjustments, that'll lay us a nice foundation for the future potentially. But it may introduce a lot of questions (e.g., what does it mean to "unsize" a pattern) that are better left unasked.

eddyb commented 7 years ago

@nikomatsakis Autoderef or autoref? If it's one bit per pattern making it separate for now is fine.

nikomatsakis commented 7 years ago

@eddyb

Autoderef or autoref? If it's one bit per pattern making it separate for now is fine.

autoderef -- that is, where you have a pattern like Some, the type you are matching might now be &Some or &mut Some (or &&Some, etc). I think I agree, I'm inclined to introduce a PatternAdjustment struct that just includes auto-deref and go from there.

nikomatsakis commented 7 years ago

Here is a rough and incomplete implementation plan for the match ergonomics RFC.

There are a few high-level things we have to do:

Right now, the code is setup to scrape this information directly from the "HIR" (the compiler's AST). The HIR node for a pattern is hir::Pat, of which the most interesting part is the kind field of type PatKind. If you have a pattern like Some(x), then, that would be represented as a tree:

We want to make this equivalent to &Some(ref x), which would be encoded as a:

We don't however have enough information to do this when we construct the HIR, since we need type checking results to do it, so we can't change the HIR itself. The way we typically handle this sort of thing then is to have the typeck encode "side tables" with auxiliary information. These tables are stored in TypeckTables and they encode all kinds of information.

In this case, I think we want two tables:

Probably a decent first PR is just to introduce the second table (pat_binding_modes) and rewrite all the existing code to use it. Right now, code that wants to find the binding mode of a binding extracts the value straight from the HIR, as you can see in the following examples:

This is not a comprehensive list, but it does have the major use-sites. You can get a more comprehensive list by doing rg 'BindByValue|BindByRef', which is what I did.

OK, no time for more, but hopefully this helps somebody get started! Please leave a note if you are interested in taking this on, and feel free to ping me on IRC (nmatsakis) or gitter (nikomatsakis) with any questions (or ask in #rustc).

tbg commented 7 years ago

I'll look into the first step:

Probably a decent first PR is just to introduce the second table (pat_binding_modes) and rewrite all the existing code to use it.

nikomatsakis commented 7 years ago

@tschottdorf woohoo!

tbg commented 7 years ago

Starting to look at this again. What do you think is a good next step, @nikomatsakis? I was considering another "plumbing" PR (introduce pat_adjustments which is always trivial), but that doesn't seem useful since it won't really be exercised. Instead I'll try to get a handle on what the computations that populate the typeck tables look like and where they'll live. I assume you'll have more pointers at some point!

tbg commented 7 years ago

Seems that the changes that populate both tables are going to happen in the general vicinity of typecks match handling, correct? Putting some of that logic in (and testing it) could be a good way to get started, even if it doesn't populate the tables yet.

Am I correctly assuming that the major (additional) place where these tables would be used is during HIR->HAIR lowering so that the HAIR representation spells out the information in the tables explicitly (at which point we're "done" with the tables)?

tbg commented 7 years ago

Is the below correct (not sure if it's useful to have this method except for assertions, but it'll help me understand what's a non-referential type).

#[allow(dead_code)]
impl PatKind {
    /// Returns true if the pattern is a reference pattern. A reference pattern
    /// is any pattern which can match a reference without coercion. Reference
    /// patterns include bindings, wildcards (_), consts of reference types, and
    /// patterns beginning with & or &mut. All other patterns are non-reference
    /// patterns.
    ///
    /// See https://github.com/rust-lang/rfcs/blob/master/text/2005-match-ergonomics.md#definitions
    /// for rationale.
    fn is_reference_pattern(&self) -> bool {
        // NB: intentionally don't use a catchall arm because it's good to be
        // forced to consider the below when adding/changing `PatKind`.
        //
        // FIXME: is the below correct? In particular, where do "consts of reference types"
        // end up?
        match *self {
            PatKind::Wild |
            PatKind::Binding(..) |
            PatKind::Ref(..) => true,
            PatKind::Struct(..) |
            PatKind::TupleStruct(..) |
            PatKind::Path(_) |
            PatKind::Tuple(..) |
            PatKind::Box(_) |
            PatKind::Lit(_) |
            PatKind::Range(..) |
            PatKind::Slice(..) => false,
        }
    }
}
eddyb commented 7 years ago

In rust, names are usually "paths", because if you can say FOO you can also say std::i32::MIN. So constants would be PatKind::Path.

tbg commented 7 years ago

@eddyb gotcha. So for PatKind::Path(_) we'd have to look into the _ and (after some magic incantation) see if it's a const ref.

I still must be missing something. A "const of reference type" would be const CONST_OF_REF_TYPE: &u8 = &5? I don't even know how to use this in a pattern at all. Is there a simple example?

eddyb commented 7 years ago

You'd literally use that name in a pattern and it's equivalent to the pattern &5. What you want though is to look at the type, not the shape of the pattern, if you want to know whether it's a reference or not.

For coercions in expressions, we compute a type for an expression, then compare it with the "expected type" coming from the parent expression and if they don't match, we can perform some adjustments. I'd expect patterns to follow a similar route.

tbg commented 7 years ago

I assume you mean the below, but I've tried and failed to get an example that compiles. What am I missing?

const CONST_REF: &i64 = &5;

fn main() {
    print!("{}", CONST_REF);
    let f = &5i64;
    match f {
        CONST_REF => (),
        _ => (),
    };
}
error[E0080]: constant evaluation error
 --> src/main.rs:1:25
  |
1 | const CONST_REF: &i64 = &5;
  |                         ^^ unimplemented constant expression: address operator
  |
note: for pattern here
 --> src/main.rs:7:9
  |
7 |         CONST_REF => (),
  |         ^^^^^^^^^
eddyb commented 7 years ago

Oh, odd. Well, a &str literal should work nevertheless (or a &[u8; N] one).

tbg commented 7 years ago

Ack, &str works, thanks. The fact that my example above doesn't compile is a bug, then? Should I file it?

tbg commented 7 years ago

Wait a minute, it doesn't work (same error):

const CONST_REF: &str = &"foo";

fn main() {
    print!("{}", CONST_REF);
    let f = "foo";
    match f {
        CONST_REF => (),
        _ => (),
    };
}
const CONST_REF: &[u8; 2] = &[1u8, 2u8];

fn main() {
    let f: &[u8; 2] = &[2u8, 3u8];
    match f {
        CONST_REF => (),
        _ => (),
    };
}
eddyb commented 7 years ago

I meant b"foo" literals. Looks like &expr is not implemented in rustc_const_eval.

tbg commented 7 years ago

Ah, gotcha. The below works.

const CONST_REF: &[u8; 3] = b"foo";

fn main() {
    let f = b"bar";
    match f {
        CONST_REF => (),
        _ => (),
    };
}
tbg commented 7 years ago

I spelunked a little bit and made a few (completely nonfunctional, except possibly compiling) baby steps here: https://github.com/rust-lang/rust/compare/master...tschottdorf:pat_adjustments

They're probably a good indication of where I need help.

tbg commented 6 years ago

Just to update the main thread, I'm looking at this again this week with the intent of having some of the examples in the RFC compile (and the rest of the compiler still working). Right now I'm blocked on (presumably) my unfamiliarity with the type system -- in the simplest example, due to some code in check_match, I see expected type TyInfer(..) in check_pat_arg where I would really hope to see TyRef(..) -- but @nikomatsakis will hopefully be able to get me unstuck.

Until I have anything sensible, I'll keep the discussion in https://github.com/tschottdorf/rust/pull/1.

nikomatsakis commented 6 years ago

Update: @tschottdorf has this almost working!

nikomatsakis commented 6 years ago

@tschottdorf As I was saying on Gitter, one thing missing from your branch is that you need to update the mem-categorization code, which sadly has not yet been ported to operate on HAIR but rather works directly on HIR (that would actually be a nice refactoring, but anyway).

What that code tries to do is to generate a "cmt", which is a representation of the path that is being referenced when something is borrowed or moved. We need to adjust the code to take the new inferred & patterns into account.

As this comment tries to explain, the relationship between a pattern and a cmt is somewhat inverted. That is, if you have a match like this:

let foo: &&(u32,);
match foo {
  &&(x,) => ...
}

The memory from which x is being extracted is reachable by the (fully explicit) path: (**x).0, which corresponds to a cmt like:

field0 { deref { deref { foo } } }

though I think in the comment I use a distinct, somewhat weirder notation: x->&->&.0, where ->& means "deref a & reference" (actually, in the comment I wrote ->@, which dates back from the days when we had Rc<T> built-in and called @T, but anyway).

The key point here is that the pattern nesting looks the opposite, with the derefs from the outside:

deref { deref { field0 { foo } } }

This is why the code that "categorizes" ref patterns (and box patterns) works the way it does:

So, to insert "false" derefs, I think we need to do something rather the opposite of the HAIR lowering -- that is, the HAIR lowering wrapped the pattern in extra derefs, but we need to go before the main code to insert our extra derefs. Roughly here. Also, this code is a bit weird in that it doens't produce a result, it invokes a callback with the result as it goes. So it'll take a touch of tweaking. I have to run now but can try to sketch out the pseudo-code later on.

Michael-F-Bryan commented 6 years ago

Just out of curiosity, is it possible to generalise this behaviour to all patterns and not just match statements?

I originally posted on the internal forum about a case where I encountered a similar problem the default binding modes is trying to solve, but on a for loop.

Excerpt from the forum post:

My particular example is that I'm iterating over a Vec<(usize, Bot)> and want to update each Bot. Currently I've got to write a big mess of &mut and ref to make the compiler happy:

for &mut (tok, ref mut bot) in &mut self.bots {
  let actions = bot.tick(&self.game_state);
   debug!("Bot {} executed {:?}", tok, actions);
}

Whereas it'd be much easier (and arguably just as understandable) if pattern matching could elide away the boilerplate so I can write something like this:

for (tok, mut bot) in &mut self.bots {
    let actions = bot.tick(&self.game_state);
    debug!("Bot {} executed {:?}", tok, actions);
}

In theory the compiler should be able to automatically dereference the tuple reference, then that because mut bot is trying to use a mutably borrowed thing it'll automatically insert ref mut bot.

tbg commented 6 years ago

This feature applies to all bindings, so I think the impl PR should help you already!

See the below snippet, which doesn't compile today but does on #44614:

pub fn main() {
    let tuples = vec![(0u8, 1u8)];
    for (m, n) in &tuples { // (m, n) are bound like &(ref m, ref n)
        let _: &u8 = m;
        println!("{} {}", m, n);
    }
}
Michael-F-Bryan commented 6 years ago

That's awesome! I'm assuming it won't matter if the "inner" things (in this case m and n) are referenced mutably, or if one is borrowed mutably but the other is borrowed immutably, will it?

tbg commented 6 years ago

The mutability translates as outlined in the RFC. For this example, (m, mut n) would bind as &(ref m, ref mut n).

phaylon commented 6 years ago

@tschottdorf Wouldn't it be (m, n) with both being &mut? Or was there some change in stategy?

tbg commented 6 years ago

@phaylon are you thinking about (m, n) in &mut tuples? I'm not aware that the RFC suggests the behavior you mention, but as always I could be wrong.

What's definitely correct is that my PR currently messes up the binding when I try it with (m, mut n) = &mut tuples (it binds n as a value). I'll figure out what's going on there (though not now, probably EOW).

phaylon commented 6 years ago

@tschottdorf For example:

let (l, r) = &mut (23, 42);

From my reading of the RFC I'd expect l and r to be &mut i32.

I might just be looking at the wrong place. Could you point me towards where mut var bindings implying &mut is specified?

tbg commented 6 years ago

Oh, now I see where you're coming from. I didn't consider that you would have to have the &mut on the right hand side. Yes, you're correct that then the above example would desugar from

pub fn main() {
    let mut tuples = vec![(0u8, 1u8)];
    for (m, n) in &mut tuples {
        *m = 6;
        *n = 7;
        println!("{} {}", m, n);
    }
}

to

pub fn main() {
    let mut tuples = vec![(0u8, 1u8)];
    for &mut (ref mut m, ref mut n) in &mut tuples {
        *m = 6;
        *n = 7;
        println!("{} {}", m, n);
    }
}

and yes, that's the case. I confused myself above because the error message that example was giving me was unhelpful.

arielb1 commented 6 years ago

Coercion chicken

@nikomatsakis

Now that I'm reading this again: In HAIR/MIR, the soundness of the situation is quite simple: coercions and subtyping occur only on vexprs. Subtyping can operate on references within a vexpr, but it's still at heart a vexpr operation. Match expressions do their subtyping when they bind the (potentially references) to the fields of the scrutinee into the bindings.

If we just wanted to be consistent with that, then the #23116 example:

#![allow(dead_code)]
use std::fmt::Debug;
struct S(Box<Debug + 'static>);
impl S {
    fn bar<'a>(&'a mut self)->&'a mut Box<Debug + 'a> {
        match self.0 { ref mut x => x } // should not compile, but does
    }
}
fn main() {}

Would perform a lexpr->vexpr->lexpr conversion, and therefore a temporary, creating MIR as follows:

temp = self.0; // lexpr -> vexpr
scrutinee_temp = &mut temp; // vexpr -> lexpr + coercion
x = &mut (*scrutinee_temp); // match binding
ret_ptr = x; // return value assignment
return

Which would of course cause a "borrow does not live long enough" when it catches you trying to return a borrow of the local temp.

Obviously, creating such coercion temporaries by default will annoy anyone who tries to use ref patterns to actually match parts of the scrutinee value by reference, so we don't. If we see a ref pattern, we prevent coercions and the resulting round-trips.

However, if all the ref bindings occur behind a reference, creating the temporary can't actually be annoying in this way - you don't take references to the value, so the temporary is invisible and we don't need to avoid it. Therefore, avoiding coercions exactly when there are explicit ref patterns is a implementable and non-annoying strategy.

An aside

The above description is not actually implementation-coherent with typeck - to satisfy closure inference, match bindings can't use subtyping, so rustc actually sometimes performs subtyping on immutable lexprs - see this comment: https://github.com/rust-lang/rust/blob/dcb4378e18571fa01e20ef63820d960f1c2cc865/src/librustc_typeck/check/_match.rs#L331-L379

Note that typeck's strategy is imperfect and leads to spurious errors in some situations

fn foo<'x>(mut x: (&'x isize, ())) {
    let a = 1;
    let (mut _z, ref _y) = x;
    _z = &a; //~ ERROR no subtyping for you!
}

fn main() {}

We'll also have to solve the closure inference wonkyness when we get to MIR-based regionck, but hopefully the MIR-based regionck won't have wonky anyway.

Pure trouble

While this argument works today, it will put us in a somewhat sticky situation if we want to allow `DerefPure impls for newtypes:

mod some_newtype {
    pub struct Newtype<T>(pub T);
    impl<T> Deref for Newtype<T> {
        type Target = T;
        fn deref(&self) -> &T { &self.0 }
    }
    impl<T> DerefMut for Newtype<T> {
        fn deref_mut(&mut self) -> &mut T { &mut self.0 }
    }
    unsafe impl<T> DerefPure for Newtype<T> { /* ... */ }
}

use some_newtype::Newtype;
fn main() {
    let n = Newtype(("hello",));
    {
        let s = "hi there".to_string();
        let (mut d,) = n;
        *d = &s; // this can't successfully modify `n`, so, create a
                 // temporary? fail? both options are unsatisfying
    }
}
nikomatsakis commented 6 years ago

I've opened https://github.com/rust-lang/rust/issues/44848 to track the so-called "coercion chicken" question.

nikomatsakis commented 6 years ago

I've also opened https://github.com/rust-lang/rust/issues/44849 to track another interesting question:

Another interesting question that @tschottdorf encountered when implementing default binding modes: What do we do with constants? The RFC specifies that we ought to treat a FOO binding that resolves to a constant as something which can skip through &T types -- however, that runs into trouble if the type of the constant itself is &str or &[T]. The current logic at least skips through all &T or &mut T types when it skips through any, but handling &str correctly would require skipping through "only the right number". @tschottdorf implemented various rules but we should at minimum update the RFC to match.

tbg commented 6 years ago

I can't be the only one who wonders: why 'coercion chicken'?

tbg commented 6 years ago

@nikomatsakis you can check the first box now.

nikomatsakis commented 6 years ago

Just filed https://github.com/rust-lang/rust/issues/46688, regarding a curious interaction that took me a bit to figure out. Probably a bug in the RFC. It has to do with what happens when you match a pattern like (a, &b) against a value of type &(u32, &u32). When we skip the first &, we get into "ref by default" mode, but when we explicitly acknowledge the second one, we do not get back into "by value" mode. That's kind of annoying.

sgrif commented 6 years ago

This feature can cause some pretty nonsensical error messages right now. For example, if you accidentally shadow a unit struct, you get an error about this feature, even though this feature has nothing to do with it and the suggestion certainly wouldn't work:

    error[E0658]: non-reference pattern used to match a reference (see issue #42640)
  --> src/query_builder/delete_statement/mod.rs:107:13
   |
18 | if let Some(name) = params.get("name") {
   |             ^^^^ help: consider using a reference: `&name`
   |
   = help: add #![feature(match_default_bindings)] to the crate attributes to enable

error[E0308]: mismatched types
  --> src/query_builder/delete_statement/mod.rs:107:13
   |
18 | if let Some(name) = params.get("name") {
   |             ^^^^ expected str, found struct `schema::users::columns::name`
   |
   = note: expected type `str`
              found type `schema::users::columns::name`
nikomatsakis commented 6 years ago

@rfcbot fcp merge

I propose that we stabilize this feature as currently implemented, though with one pending change. Here is a summary.

Tests that document current semantics

The tests for this feature can be found in the following directories:

Changed from the RFC

The one change from the RFC was to "reset" the binding mode to "by value" whenever you encounter an explicit & or &mut pattern. This is to resolve the confusion described in https://github.com/rust-lang/rust/issues/46688. The PR has not yet landed, but it is open.

Questions encountered

We encountered two curious cases:

Constants. The first was how to treat constants (https://github.com/rust-lang/rust/issues/44849): what should the semantics be when matching a constant of reference type? The current solution (also what is described in the RFC, I believe) is to be conservative: such a pattern does not attempt to insert "auto-derefs" (or "auto-ref-patterns", as you prefer). This does imply that the following does not compile:

#![feature(match_default_bindings)]

fn main() {
    let x = "123";
    match &x {
        "123" => println!("Yes!"),
        _ => println!("No!"),
    }
}

Note that &"123" will work. This doesn't seem that important, though, and moreover it would be backwards compatible to fix it in some way (since it is now an error).

Coercions. The other problem was the interaction with coercions (#44848). Re-reading this comment I wrote, and @arielb1's replies, it seems like the current rule -- using the presence of a syntactic ref as a marker for when to disable coercion -- suffices. And anyway the worst that could happen is that we get less than ideal errors, basically.

rfcbot commented 6 years ago

Team member @nikomatsakis has proposed to merge this. The next step is review by the rest of the tagged teams:

No concerns currently listed.

Once a majority of reviewers approve (and none object), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

withoutboats commented 6 years ago

I turned this feature on recently when I was destructing a syn AST by reference, and it was an absolute joy to use. :-)

rfcbot commented 6 years ago

:bell: This is now entering its final comment period, as per the review above. :bell:

nikomatsakis commented 6 years ago

@withoutboats

I turned this feature on recently when I was destructing a syn AST by reference, and it was an absolute joy to use. :-)

Heck yes. I can't live w/o it anymore.

nox commented 6 years ago

It seems that the RFC says that rustc may end up using a ref mut binding mode for values on which &mut methods are called, am I reading this wrong? Suggestions about making let foo = … be the same as let mut foo = … were met with much criticism, so I'm really confused now if I'm correct about my understanding of this RFC.

petrochenkov commented 6 years ago

I turned this feature on recently when I was destructing a syn AST by reference, and it was an absolute joy to use. :-)

I used them in rustc where possible since they were implemented and found the result slightly harder to read and to figure out what happens with ownership.

petrochenkov commented 6 years ago

At least it certainly requires some time and "rewiring" in the brain to happen to read the code written in the new style.

petrochenkov commented 6 years ago

Some minor annoying detail: if the binding mode is inferred to be by-reference, then "small" values like integers are bound by reference as well and you have to add dereferences. Example:

// Previously
Variant { ref my_vector_of_things, my_int } =>
    MyStruct { len: my_vector_of_things.len(), my_int }

// With default binding modes
// Ugly `*` and can't use field shortcut anymore
Variant { my_vector_of_things, my_int } =>
    MyStruct { len: my_vector_of_things.len(), my_int: *my_int }

https://internals.rust-lang.org/t/suggestion-references-in-struct-literal/6789/10 may help here though.