rust-lang / rfcs

RFCs for changes to Rust
https://rust-lang.github.io/rfcs/
Apache License 2.0
5.96k stars 1.57k forks source link

Destructuring assignment #372

Closed glaebhoerl closed 4 years ago

glaebhoerl commented 10 years ago

Given

struct Point { x: int, y: int }
fn returns_point(...) -> Point { ... }
fn returns_tuple(...) -> (int, int) { ... }

it would be nice to be able to do things like

let a; let b;
(a, b) = returns_tuple(...);
let c; let d;
Point { x: c, y: d } = returns_point(...);

and not just in lets, as we currently allow.

Perhaps even:

let my_point: Point;
(my_point.x, my_point.y) = returns_tuple(...);

(Most use cases would likely involve mut variables; but those examples would be longer.)

Related issues from the rust repo: https://github.com/rust-lang/rust/issues/10174 https://github.com/rust-lang/rust/issues/12138

bombless commented 8 years ago

and I doubt if it works when type projection involves

bombless commented 8 years ago

I'm thinking about a total different syntax for this.

let x;
match (1, 3) {
  (!x, y) => () // you can use `y` here, as well as `x` which is from outter scope
}
// at this site `y` is out of scope but `x` is available here

this way we can designate a part of bindings in the pattern is used to do a mixed binding (that is, bound variable is actually from ancestor scope).

Kimundi commented 8 years ago

@bombless: The technique I described would very much not mean analysing types before the AST, on the contrary it would just parse the same way as today, with the only difference of interpreting what it parsed differently in some cases where you always get a error message today.

bombless commented 8 years ago

if i understand correctly it bail out to type check the lvalue-like expression to a pettern, as if we treat pattern as a "primary type" when we do type inference?

what I wanted to say is that pattern is not like lvalue at all so it may make AST ugly or incovinient.

bombless commented 8 years ago

anyway I guess when MIR lands the design of AST become less important

bombless commented 8 years ago

Also, do you tend to make the pattern in destructuring assignment less powerful, just like patterns in if-let, while-let are secondary class compared to match, @Kimundi ? Otherwise I think we have to treat all expressions as potential patterns in your solution when we parse.

ticki commented 8 years ago

No, @bombless, it requires no such thing.

Kimundi commented 8 years ago

@bombless: I'll repeat it one more time before giving up this particular thread of discussion. ;)

bombless commented 8 years ago
let ref mut x;
(x,) = (1,);
*x = 2;

Now I get it, so the pattern is actually in the let?

bombless commented 8 years ago
fn main() {
    let x = ref 1;
}
<anon>:2:13: 2:16 error: expected identifier, found keyword `ref`
<anon>:2     let x = ref 1;
                     ^~~
<anon>:2:17: 2:18 error: expected one of `!`, `.`, `::`, `;`, `{`, or an operator, found `1`
<anon>:2     let x = ref 1;
                         ^
playpen: application terminated with error code 101

playground

I just don't see why pattern is already parsed as expression.

What I described above involves not parsing an destructuring assignment as a pattern, but as an expression - like it is today already.

Can you help me figure it out, @ticki ?

taralx commented 8 years ago

@ticki @bombless is correct -- not every destructuring assignment uses a valid expression:

(ref l[0], _) = a

It will be necessary to unify the pattern and expression grammars.

eddyb commented 8 years ago

@taralx ref is a binding modifier which binds a reference instead of the value, do you think it's actually useful in the LHS of an assignment? Your example would mean l[0] = &a.0;, AFAICT. I'm more worried about slice patterns TBH.

taralx commented 8 years ago

@eddyb Maybe not useful in most cases, but you never know. People will expect it to work.

Veedrac commented 8 years ago

@bombless' suggestion to just reuse normal let expressions but mark variables to refer to preexisting mut variables seems like a smaller change with fewer complications that's nonetheless more powerful.

For example, it means we don't have to deal with unifying expressions with patterns, it means things like match will work "for free" and it means we can do partial assignment.

My syntax suggestion would be to reuse @ syntax with a different symbol (= is promising), so one could write

let (x = _, y = _) = { ... };

// same as the existing syntax
let (x @ _, y @ _) = { ... };
// but assigns to preexisting variables

It would have to be banned at top-level in patterns in if let and while let, but I can't imagine anyone using them there on purpose. It can do some nice things like

let Tree { left, right, head: minimum = _ } = ...;

match dispatch() {
    last_op = Read => ...,
    last_op = Write => ...,
    Skip => continue,
}

The main problem is having to write = _ more frequently than otherwise, but IMO that's actually a benefit when I tried it. @tstorch's example

let (left = _, right = _, offset = _, period = _) =
    if a < b {
        // Suffix is smaller, period is entire prefix so far.
        (left, right + offset, 1, right - left)
    } else if a == b {
        // Advance through repetition of the current period.
        if offset == period {
            (left, right + offset, 1, period)
        } else {
            (left, right, offset + 1, period)
        }
    } else {
        // Suffix is larger, start over from current location.
        (right, right + 1, 1, 1)
    };

still looks OK to me, and it's quite clear that there are four assignments happening, which can sometimes be less apparent with the a, b, c, d = syntax in Python (at least in my experience).

Thoughts?

ticki commented 8 years ago

I would really, really avoid reusing let. Declaration and assignment are two entirely different things. Reusing let would be a breaking change.

Veedrac commented 8 years ago

Why would it be a breaking change?

ticki commented 8 years ago

Because of scoping:

let mut a = 2;

{
    let a = 3;
}

// a would be 2 using the current rustc.
// but by your proposal it would be 3.
BlacklightShining commented 8 years ago

@Veedrac Simpler example to see if I have this syntax right.

let (foo, bar);
// Stuff happens here.
let (foo = _, bar = _) = get_some_two_tuple();

This seems pretty confusing to me. First, there's the let, even though it's assignment, not declaration. Or…does that also let you declare new variables (let (foo = _, bar = _, quux) = get_some_three_tuple();?) Second, the = _. I don't understand the motivation for this, and it looks like an assignment of _ (which makes no sense). @ might be a better choice, but then the _ just seems unnecessary; there's no ambiguity as to what is being assigned.

Even so, I would prefer (foo, bar) = get_some_two_tuple(). I don't see anything wrong with that syntax, and Python and Swift already use it.

Veedrac commented 8 years ago

let a = 2 will still bind, not assign. You'd have to do let (a = _) = 2, but that would be equivalent to let (a @ _) = 2 which doesn't compile.

let a @ _ = 2 does compile, but I explicitly banned top-level assinments in let, if let and while let (eg. let a = _ = 2) because they're stupid.


With your example,

let (foo, bar);
// Stuff happens here.
let (foo = _, bar = _) = get_some_two_tuple();

this is just like

let (foo @ _, bar @  _) = get_some_two_tuple();

except that foo and bar assigned to the preexisting values. Ergo these two are both valid:

let (foo = _, bar = _, quux) = get_some_three_tuple();
let (foo @ _, bar @ _, quux) = get_some_three_tuple();

because the latter is valid. Conceptually, you take a let (_, _) pattern and you just add assignments inside: let (foo = _, bar = _). @ is already used, so using that would be a breaking change.


The reason I prefer this to (foo, bar) = ... is mostly because it avoids introducing a new pattern syntax which is nontrivially different to the current one. My proposal requires only a very minor addition to the grammar (mostly just '@''@' | '=', though you also need to exclude top-level bindings where it's problematic). You also don't have LL(infinity) problems.

But I also like the ability to both bind and assign, which looking back I think would be quite useful, and for it to work with refutable patterns. It'd be a pain if you couldn't use if let when you upgrade to an Option for whatever reason.


I think it's also worth noting that there's already precedent for using let without binding. Consider things like

while let None = next() { ... }

IMO let in Rust means "pattern match".

ticki commented 8 years ago

IMO let in Rust means "pattern match".

Not at all. Both if let and while let binds variable, not assigning them.

BlacklightShining commented 8 years ago

@Veedrac

Conceptually, you take a let (_, _) pattern and you just add assignments inside

This still doesn't make sense to me, particularly because nothing is actually being assigned there (_ cannot be assigned!), and even if something was, it would immediately be clobbered by the overall assignment (and thus be optimized out). I don't see the value of let (foo = _, bar = _) over let (foo, bar).

Also, I thought that the proposition wasn't to add a new pattern syntax, but rather to extend certain expressions only to be usable as lvalues. I think someone already pointed out that refutable patterns are useless here: what would Some(foo) = bar do? It works in if and while because branching happens based on whether it's refuted, but assignments aren't conditional like that.

Veedrac commented 8 years ago

@BlacklightShining

I think I may have misunderstood you before. To clarify, let (foo = _, bar = _) does not ever declare new variables. You wouldn't do let (foo = _, bar = _); on its own any more than you'd do let (foo @ _, bar @ _); on its own (although that is actually valid). You'd only do this when actually assigning, like let (foo = _, bar = _) = (x, y);.

If there isn't an assignment (so the _ is actually unbound), either it can just be outright illegal (error: variable not bound) or it would "reset" the variable to the "uninitialized" state (aka. the same state it would be in if it was a fresh variable bound with @). The precise behaviour here is unimportant because nobody would ever actually do that and either suggestion is coherent.

If foo is not declared, you'd get the same unresolved name you currently do when you write foo = bar; without declaring it.

Refutable patterns are "useless" when one only has basic bindings, but my suggestion is an alternative to extending expressions. Extending expressions has been shown problematic to the grammar, and it introduces a sort'a-pattern pseudo-syntax. I'm suggesting that instead we extend patterns, which automatically means it works with if let, while let and matchs and doesn't have these complications.


Fundamentally, I do understand the objection to writing = _. The expression form does look a little cleaner. But it's a little cleaner, not a lot, and IMO the other points I've made far outweigh that.

glaebhoerl commented 8 years ago

(foo, bar) = (x, y); is way way way more obvious than let (foo = _, bar = _) = (x, y);. If I saw the latter without foreknowledge, there's a good chance I'd have to ask on SO or IRC to figure out what the heck it means.

ticki commented 8 years ago

The other syntax proposed is very far away from the motivation, namely simple, obvious, and unverbose destructuring assignment.

BlacklightShining commented 8 years ago

@Veedrac if let and while let are declarations, though. They create a new scope. let foo = ...; if let (foo, bar) = get_some_two_tuple() { ... } already works in stable Rust, and will continue to work if this lands. Ditto for match arms. This issue is about assignment, without creating a new scope.

I think I get this now. Despite being written as let (foo = _, ...) = ...; rather than let (foo @ _, ...) = ...;, the _ is meant not as an rvalue, but rather as an indication that whatever is being assigned to foo should come from the right side of the overall statement, right? Kind of like match?

Veedrac commented 8 years ago

@glaebhoerl

I'm tempted to say that Rust already acts this way; @ in patterns is certainly not obvious on first sight for instance. And I think the trivial examples make it seem less clear than it is. Consider some very fake code:

let mut state = State::Begin;
while let Some(bytes, state = _) = self.receive(state) {
    ... // use bytes
}
match state {
    ...
}

I find this somewhat self-explanatory. But I do appreciate that =, like @, is less discoverable and sometimes that matters most.

@BlacklightShining

Obviously I'm worse at explaining things that I thought. Perhaps this simple translation procedure should help. Basically, replace something like

let (foo = _, bar = (_, _)) = ...;

with

let (temp1 @ _, temp2 @ (_, _)) = ...;
foo = temp1;
bar = temp2;

As such, _ is indeed not an rvalue but part of the pattern, just like with @. I'm not suggesting changing what if let (foo, bar) = get_some_two_tuple() does.

Some more examples:

if let Ok(thing = Some(_)) = get() {
    use(thing);
}

changes to

if let Ok(temp @ Some(_)) = get() {
    thing = temp;
    use(thing);
}

And

match get() {
    Some(arg = 1) => { use(arg); },
    Some(arg = 2) => { use(arg); },
    Some(arg = _) => { use(arg); },
    None => { continue; },
}

changes to

match get() {
    Some(temp @ 1) => { arg = temp; use(arg); },
    Some(temp @ 2) => { arg = temp; use(arg); },
    Some(temp @ _) => { arg = temp; use(arg); },
    None => { continue; },
}
taralx commented 8 years ago

I don't think this extended pattern syntax is any better. You still have to merge the expression and pattern syntaxes or limit the assignable lvalues. So it's an unfamiliar syntax instead of a familiar one.

Veedrac commented 8 years ago

@taralx Now that you mention it, I think so too. I forgot just how much can go on the LHS of an assignment. Nvm then, I retract my claim.

ticki commented 8 years ago

Again, let is for declaring, not assigning. Reusing it for assigning will lead to confusions.

philippkeller commented 8 years ago

coming from Python it was surprising for me that this isn't possible. However: my usecase was to circumvent using a tmp variable, solved it with tuples instead:

let mut fib = (1,2);
while cond {
    fib = (fib.1, fib.0+fib.1);
}
louy2 commented 8 years ago

So it's a two part problem:

  1. a friendly destructuring assignment syntax
  2. an implementation of the syntax that preserves LL(k) grammar

@Kimundi's implementation seems best if @taralx's concern is solved. I think it is worth it to abandon ref on LHS for this to work, but discussion welcome.

If a keyword is needed, I like @ldpl's mut best. A bit tricky as "mutable" becomes "mutate", but it's still easy to understand.

If we insist an existing keyword is reused, match seems the closest to me. Involving ref in here risks confusion.

glaebhoerl commented 8 years ago

(I don't think ref on an assignment LHS even makes sense.)

amosonn commented 7 years ago

match definitely seems to be a good keyword, making the destructuring more implicit, and also shorthanding another current possible workaround: match f(a, b) { (x, y) => { a = x; b = y; } }. IMO this is cleaner than let x, y = f(a, b); a = x; b = y; which contaminates the scope more.

dgrunwald commented 7 years ago

An extra keyword is only needed if we want to treat the left-hand-side as a pattern instead of an expression. But I don't think there's a good reason to do that.

In fact, the last example in the original post doesn't even use a valid pattern on the left-hand-side:

let my_point: Point;
(my_point.x, my_point.y) = returns_tuple(...);

All that's really necessary to make tuple/struct unpacking work are two rules for syntax desugaring (without any parser changes at all!):

  1. If the left-hand-side of an assignment is a tuple expression: ($expr1, $expr2) = $rhs; then desugar the assignment to a block:

    {
    let (tmp1, tmp2) = $rhs;
    $expr1 = tmp1;
    $expr2 = tmp2;
    }
  2. If the left-hand-side of an assignment is a struct literal: Struct { field1: $expr1, field2: $expr2 } = $rhs; then desugar the assignment to a block:

{
   let Struct { field1: tmp1, field2: tmp2 } = $rhs;
   $expr1 = tmp1;
   $expr2 = tmp2;
}

I believe this is the approach suggested by @Kimundi; but I thought it's worth mentioning that this is basically just simple syntax sugar that can be desugared prior to type analysis. In fact this seems like it might almost be doable with macro_rules!.

So there's really no reason to introduce new syntax unless you want to use different patterns in destructuring assignments than just tuples/struct literals. If so, which patterns do you want to use?

The only really useful pattern I can think of is _. This could be done by extending the expression grammar with a _ expression, and rejecting any remaining use of _ after desugaring. In general, we could probably support most patterns by extending the expression grammar in this way; but I doubt many types of patterns are worth it. I imagine tuples alone cover the vast majority of the use cases of this feature.

burdges commented 7 years ago

I'd agree tuples and newtype structs cover almost all use cases, but you'll want fixed length arrays along with regular and tuple structs too, and nested types work fine. I think ref gets replaced by * which sounds fine. And no need for @ bindings, guards, etc. No enum variants. I donno the rust grammar but this sort of lvalue expression looks context free at first blush.

What about arrays more generally? https://github.com/rust-lang/rfcs/pull/495

torpak commented 7 years ago

What about using something like tie from c++?

let (a, b) = (3, 4);
...
tie (a, b) = (5, 6);

that is just as easy to write and needs no complex extension of the parser.

eddyb commented 7 years ago

@torpak I thought the parser problem is a solved one: don't try to have full pattern syntax oh the LHS, handle oh the intersection with expressions. Since tie is not already a keyword, what you wrote parses. In fact... tie(a, b).x = (5, 6); could even pass all checks and run. The issue here from what I can tell is people don't seem to like the "intersection of expressions and patterns" approach even when it could work well for most cases.

louy2 commented 7 years ago

Is there a properly formatted RFC for this feature yet?

steveklabnik commented 7 years ago

I am not aware of any.

c0b commented 7 years ago

so it's only because no RFC for this yet? and no committers have reviewed ? can we call a few of them by mentioning

coming from Python world, this is very natural to calculate Fibonacci numbers, by tuple assignment:

In [1]: a, b = 1, 1

In [2]: for i in range(10):
   ...:     a, b = b, a+b
   ...:     

In [3]: a, b
Out[3]: (89, 144)

and Javascript ES7 does not have tuple, but have object construct and destruct:

> let a = 1, b = 1;
undefined
> for (let i in Array.from({length: 10})) {
... ({ a, b } = { a: b, b: a+b });
... }
{ a: 89, b: 144 }

and Javascript Array spread assignment, this is valid since ES6 (ES2015)

> for (let i = 0; i < 10; i++) [ a, b ] = [ b, a+b ];
[ 89, 144 ]
burdges commented 7 years ago

I dislike that this encourages unneeded mutability.

Instead of new assignment syntax for mutable variables, what about some assign/mutate/unlet syntax that locally altered the behavior of patterns from declaration to mutating assignment?

let mut field2;
let x = foo_mut();
loop {
   ...
   let Struct {
       field0,  mut field1,  // ordinary let declarations using destructuring with field puns
       field3: assign!(*x) // assignment to mutate the referent of x
       assign!(field3),   // assignment to existing mutable variable field3 ala field puns
   } = rhs();
   ...
}

It'd be obnoxious to write (assign!(a), assign!(b), assign!(c)) = rhs(); of course, but you should never do that anyways since invariably some elements should always be new immutable declarations instead of assignments to mutable variables. Also, this approach might work inside match or while/if let, and in more complex destructurings.

I picked a macro syntax here because it's only sugar declaring another binding and assigning to the mutable variable. Also, the () help delineate a full lhs term when dereferencing. I think unlet!(), mutate!(), or even mut!() could all make reasonable names as well. Alternatively new names like assign, unlet, mutate could work as keywords too, ala (a, unlet b, c) = rhs();. I could imagine sigil based syntaxes or even using =, ala (a, (b=_), c) = rhs();, although field puns might be problematic.

MichaelBell commented 6 years ago

As a rust newbie I'd just like to add my +1 that it's definitely confusing that you can do let (a, b) = fn_that_returns_tuple(); but not (a, b) = fn_that_returns_tuple();

I understand there are issues with implementation, but @dgrunwald seems to be saying there's a simple option that would work in the majority of cases that people actually care about - I think that would be worth doing!

louy2 commented 6 years ago

@burdges In pure languages this is unnecessary at least partly because loops like @tstorch's can (or must) be easily turned into recursive helper functions. But in Rust, with tail call optimization not guaranteed, as well as its goal of attracting audience from dynamic languages, I think destructuring assignment is a reasonable compromise.

phaux commented 6 years ago

Would it be possible to allow tuples/structs as actual lvalues, so that we don't need to differentiate between pattern/expr at all?

eddyb commented 6 years ago

@phaux I would hope so (syntactically), preferably in a way that (*x, y.field) = (a, b) works.

Kimundi commented 6 years ago

Hm, now that's an idea, though I'm not sure how backwards compatible it would be:

let a = 5;
let b = "hello".to_string();

let (ref x, ref y) = (a, b); // this would move `a` and `b` today, but just reference them with constructors becoming lvalues.

That said, I think it wouldn't work anyway, as you would need a single canonical address for a constructor lvalue - which means constructing it from other lvalues would not really work (unless we special case it in the language, such that you get only an error if you attempt to take the address of a constructor lvalue, or magically let it behave as a rvalue in that case)

eddyb commented 6 years ago

@Kimundi I don't think it's plausible to have something like that, no, I interpreted @phaux's to refer to the alternative I like which is keep parsing expr = expr but interpret the LHS differently than the RHS, without involving patterns or creating some sort of "value" for the LHS.

phaux commented 6 years ago

@eddyb Exactly.

Kimundi commented 6 years ago

Ah, then I misunderstood. So basically one of the things which have been proposed already.

burdges commented 6 years ago

If Foo is Sized and big, then the API will handle fn foo() -> Foo by creating an uninitialized Foo passing foo the pointer to it, yes? So Foo { whatever } = foo(); requires creating a temporary, right? You presumably meant that semantically, but not sure I understand that either.

I'm still nervous about encouraging mutability, but could the syntactic issue be handled by "commuting" the let inside the "pattern"? So

let a;  // uninitialized here
let mut b = bar();  // mutable
let c = baz() : &mut u64;  // mutable reference
virtual Foo { a, b, *c, let d, let mut e } = foo(); // named field puns
// only a and d are immutable 

You could replace virtual with another keyword, or remove it entirely, except people worried about the grammar complexity up thread. You could not however do let r = virtual Foo { a, b, *c, let d, let mut e }; r = foo(); because r would be only partially initialized, which might be @eddyb's point.