rust-lang / rfcs

RFCs for changes to Rust
https://rust-lang.github.io/rfcs/
Apache License 2.0
5.9k stars 1.57k forks source link

Filtered Blocks #1702

Open leksak opened 8 years ago

leksak commented 8 years ago

In https://www.youtube.com/watch?v=QM1iUe6IofM the speaker describes an interesting idea that may be summarized as a block that explicitly states which variables from the enclosing scope will be used. Read/write is not explicit, only usage.

Today a plain-nested block that accesses the enclosing scope looks like this,

fn main() {
    let mut foo = 5;
    // Other variables

    {
        foo += 2;
    }

    assert_eq!(7, foo);
}

For lengthier blocks such as,

lfn main() {
    let mut foo = 5;
    let bar = 6;
    // Many other variables

    {
        foo += 2;
        // Other lines of code, only by inspection
        // can we tell what was used. Maybe bar was
        // used, maybe not. We'd have to study the code
        // to know
    }

    assert_eq!(7, foo);
}

there may be several of the preceding variables in the enclosing scope used inside the block.

If we could specify which variables that we'll access inside the nested block we could write something like,

using foo, bar {
    // uses foo, bar. Linter could catch if one of them is unused
}

While semantically equivalent to a named function we do not have to come up with a descriptive name for the function, nor can we call the block again, making unintended use "impossible".

I'd humbly suggest that this would be a nice addition to the Rust language.

Stebalien commented 8 years ago

Unfortunately, that would be one more special syntax we'd have to teach users and, IMO, it doesn't pull its weight (I've never personally wanted something like this). As you pointed out, you can just use a function. If you don't want to define a separate function, you could have a macro do it:

macro_rules! restricted {
    (using $($var:ident),*; $($code:tt)*) => { /* stuff */ },
}

restricted! {
    using foo, bar;
    println!("{}, {}", foo, bar);
}
Stebalien commented 8 years ago

Implementation (that, unfortunately, does require declaring explicitly declaring mutability):

macro_rules! restricted {
    (@inner ($($refs:expr,)*) ($($vars:tt)*) , &mut $var:ident $($rest:tt)+) => {
        restricted!(@inner ($($refs,)* &mut $var,) ($($vars)* $var: &mut _,) $($rest)+)
    };
    (@inner ($($refs:expr,)*) ($($vars:tt)*) , &$var:ident $($rest:tt)+) => {
        restricted!(@inner ($($refs,)* &$var,) ($($vars)* $var: &_,) $($rest)+)
    };
    (@inner ($($refs:expr,)*) ($($vars:tt)*) , $var:ident $($rest:tt)+) => {
        restricted!(@inner ($($refs,)* $var,) ($($vars)* $var: _) $($rest)+)
    };
    (@inner ($($refs:expr,)*) ($($vars:tt)*) ; $($code:tt)*) => {{
        let inner = |$($vars)* _: ()| {
            $($code)*
        };
        {
            fn assert_static<T: 'static>(_: &T) {}
            assert_static(&inner);
        }
        inner($($refs,)* ())
    }};
    (using $($rest:tt)*) => {{
        restricted!(@inner () (), $($rest)*)
    }};
}

fn main() {
    let foo = 1u32;
    let bar = 2u32;
    let mut sum = 0u32;
    restricted! {
        using &foo, &bar, &mut sum;
        *sum = foo + bar;
    };
    println!("{}", sum);
}
ticki commented 8 years ago

This is not worth the price of complexity.

nagisa commented 8 years ago

I do not understand what’s the point of this proposal if the using peach, banana { ... } isn’t a sugar for { let peach = peach; let banana = banana; ... }, but even if it was I’m not really keen on the idea.

Stebalien commented 8 years ago

@nagisa Basically, he doesn't want to accidentally use/mutate the wrong variable. IMO, the best way to deal with this is to just use descriptive variable names (or define a new function).

leksak commented 8 years ago

@nagisa You hit it square on the 'noggin. Unlike descriptive variable names, a feature like this let's us declare intent that can be verified at compile-time. As for new functions, they fill the same need but they create a named entity that can be called anywhere within its enclosing scope.

sullyj3 commented 8 years ago

Programming is all about managing complexity effectively. As your codebase grows, you want to be able to keep your head around what exactly is going on. Keeping cognitive load to a minimum is vital in this regard.

There are a couple of reasons to extract a block of code into a function: The first is abstraction. Being only human, there's only so much we can keep in our head at once. We want to be able to reason at a higher level without implementation details clogging up our cognitive resources. We want to think about what is being done, not how. So we break our programs up into smaller, logical, modular chunks, and our brains don't overheat as a result.

The second is reuse. Don't Repeat Yourself. We don't want to be writing code with the same functionality over and over again.

The problem with extracting code into a function is that it conflates these two goals, and comes with a few downsides. If you have a single logical unit of behaviour, you don't necessarily want to be able to reuse it. If not, extracting to a function leads to unnecessary indirection. You can't just read through the outer function to understand its behaviour, now you have to firstly find the implementation of the extracted function, read it, and then come back and continue reading. Unnecessary extra complexity, poor use of limited cognitive resources.

Not only that, having a new name floating around gives you another thing to worry about. Is this function being called anywhere else? Who knows? It takes a few extra steps to find out. Another thing to worry about. Another potential way to complect the call graph.

The way I see it, the main benefit of a construct like using is to get the abstraction benefit of modular chunks of code, without the complexity cost of having to look elsewhere for the implementation. It also avoids the complexity of having to think about whether this chunk of modular code is called anywhere else (it isn't).

This seems to me to be an incredibly useful thing to have.

As mentioned in the video, it should be easy to add editor/ide support for extracting a use block to a function, if you do end up finding another use for it.

(Side note, I'm not super keen on the keyword using; too similar to use, and I'd prefer a shorter one. I can't think of anything better for now, though.)

sullyj3 commented 8 years ago

@Stebalien

Basically, he doesn't want to accidentally use/mutate the wrong variable.

I feel like this doesn't quite capture the benefit. It's not about writing (it's not that hard to just not use a variable you don't want to use), it's about reading. Signalling to the reader that they don't have to worry about any effects of the block on the surrounding state, except those caused by mutating the "using-ed" variables. Modularity.

leksak commented 8 years ago

@sullyj3 You described it as I wish that I would have, it is for precisely those reasons that you outlined that I think that this language construct would be worthwhile. Can't come up with a better keyword myself either.

DanielKeep commented 8 years ago

As an aside: using a function isn't always a good solution, because they interfere with flow control, the variables being captured have to be specified twice, you have to manually ascribe all the types, and you can't pass by "owned pointer". All this means it's not just a question of slapping some tokens around a block and having it work.

Jon Blow also gave a pretty good justification for this in his own language, Jai: it exists as an intermediate step when refactoring code into a function: specify the variables you expect the block is capturing, fix the code to actually conform to those expectations, then you can easily lift into a function.

This syntax, if introduced, should also be applicable to closures.

leksak commented 8 years ago

@DanielKeep I disagree that the syntax should be applicable to closures. Whatever keyword is chosen a closure declaration might end up being even more verbose. Consider,

let plus_one = |x: i32| -> i32 { x + 1 };

Adding another keyword into that mix will, in my opinion, have a negative effect on legibility. Furthermore, a closure is already named, and if you want to explicitly declare which variables are used you may as well define a function (possibly nested). Move semantics and borrowing is already well-defined with respect to closures.

leksak commented 8 years ago

@sullyj3 Would with make for a nice keyword? It is shorter than using and not at all similar to use.

Then, we'd see code such as

with foo, bar {
    // uses foo, bar. Linter could catch if one of them is unused
}

which I read as "with foo and bar do all of these things".

sullyj3 commented 8 years ago

I like that. One potential issue is that for Python programmers, with means invoke context management. I don't think it'd be too big a deal though.

leksak commented 8 years ago

I think that the usage is enough to distinguish Python with from a Rust with.

In my opinion this Python code

with open("x.txt") as f:

reads very differently from any Rust with statement I imagine myself writing.

DanielKeep commented 8 years ago

@leksak Well, it's not like you have much choice in the matter. The body of a closure is an expression, so it would be really weird if you couldn't use this construct for a closure. Actually, now that I think about it, you could also just have the body of the block be a closure.

I happen to think that since one of the purposes of a closure is to abstract out repeated code, it might be wise to consider how to most ergonomically integrate the new syntax with them, that's all. Just be to clear, since you might have misconstrued what I wrote: I'm not saying you have to specify captures on closures, I'm saying it should be possible.

DanielKeep commented 8 years ago

Also, here's a description of how Jon Blow saw this being used for code refactoring purposes. In particular, this little snippet showing the progression from block to named function:

                                 { ... } // Anonymous code block
                       [capture] { ... } // Captured code block
     (i: int) -> float [capture] { ... } // Anonymous function
f := (i: int) -> float [capture] { ... } // Named local function
leksak commented 8 years ago

@DanielKeep:

I'm not saying you have to specify captures on closures, I'm saying it should be possible.

My interpretation was exactly that, I might not have a choice in the matter but I don't have to be keen on how it would look. But I'd rather have this feature, and extend the docs for closures, than not have this feature for the reasons outlined by @sullyj3.

@DanielKeep, I am trying to dream up some consistent syntax for this language construct, that aligns well with the existing syntax. Maybe you can weigh in?

For plain with blocks, I think it is reasonable to state the used variables up-front, like so

with foo, bar {
    // uses foo, bar. Linter could catch if one of them is unused
}

We have multiple ways of writing out a closure today,

let num = 5;
let plus_num_v1 = |x| x + num; // No type annotation
let plus_num_v2 = |x: i32| x + num; // Type annotation for input argument

// Type annotation for input argument and return value
let plus_num_v3 = |x: i32| -> i32 { x + num } 

As mentioned in the Rust documentation, this is consistent with function declarations,

fn  plus_num_v1   (x: i32) -> i32 { x + num }
let plus_num_v2 = |x: i32| -> i32 { x + num };
let plus_num_v3 = |x: i32|             x + num;  

We have to solve the problem of adding in the keyword, (we will use with for now) as well as the arguments to the with keyword, without making the resulting code look entirely too muddy.

The nosiest closure, according to me, is the one that uses type annotations. Hence, I want to "solve" adding in the language construct to let plus_num_v2 = |x: i32| -> i32 { x + num };. Personally, I cannot figure out an appeasing look, for instance including the keyword inside the || makes the semantics unclear,

let plus_num_v2 = |x: i32 with num| -> i32 { x + num };

Adding with at the end does not impact the current closure syntax, but is inconsistent with how I imagine a with block would look, I.e. having

let plus_num_v2 = |x: i32| -> i32 { x + num } with num;

should arguably change

with foo, bar {
    // uses foo, bar. Linter could catch if one of them is unused
}

to

{
    // uses foo, bar. Linter could catch if one of them is unused
} with foo, bar;

which I think is a poor trade-off. Also, we have to imagine how it would look like using type annotations in both cases. I think

with foo: i32, bar: i32 {
    // uses foo, bar. Linter could catch if one of them is unused
}

plays out fine, but the closure examples I have offered suffer even further in those cases. Any suggestions?

DanielKeep commented 8 years ago

I'd probably just go with |x| -> i32 with num { x + num }. We already have where coming after the return type on functions. That way, the rule can just be "it goes before braces".

As an aside, there should probably be an explicit syntax for "captures nothing"; perhaps with () { ... }.

Actually, one of the problems with global variables is that you can't tell where they might be used. As such, it might be nice to extend this to functions as well: fn f() -> i32 with some_global { ... }. In that case, I can imagine a clippy lint that lets you require all captures be explicit, which I bet would make some people happy.

leksak commented 8 years ago

@DanielKeep that looks nice, and I agree with both of your additions.

yigal100 commented 8 years ago

IMO this adds complexity for a marginal gain at best and does not pull its weight.

With regard to @sullyj3 's comment:

The problem with extracting code into a function is that it conflates these two goals, and comes with a few downsides. If you have a single logical unit of behaviour, you don't necessarily want to be able to reuse it. If not, extracting to a function leads to unnecessary indirection. You can't just read through the outer function to understand its behaviour, now you have to firstly find the implementation of the extracted function, read it, and then come back and continue reading. Unnecessary extra complexity, poor use of limited cognitive resources.

Not only that, having a new name floating around gives you another thing to worry about. Is this function being called anywhere else? Who knows? It takes a few extra steps to find out. Another thing to worry about. Another potential way to complect the call graph.

I would definitely agree with the above for C code but not for Rust. Rust allows nesting functions and that alleviates most of the above concerns. Not to mention the availability of closures.

I vote nay for this proposal.

cheery commented 8 years ago

This would be extremely simple construct to implement in Lever programming language. But I think it will not provide sufficient advantage to be worth the cost it takes to document the added semantics. Because the function boundaries already solve the problem.

When your function grows large enough that the relationships between variable start to confuse people, it is legitimate reason to split the function into two. In your example, when you need two functions, you would do it like this in Lever:

was_way_too_big_function = ():
    # other things, plus 'foo' and 'bar' defined.
    neater_function(foo, bar)
    # other things after
neater_function = (foo, bar):
    # too big section that would have required 'using' syntax described above.

The standard convention in my language is to put the split-out part of function below where it was split-off. This is in logical order where the most people would prefer to read the code.

Stebalien commented 8 years ago

Note on using with as a keyword: JavaScript, VB, and Kotlin (and probably more) use with to bring an object's members into scope.

That is, instead of writing:

receiver.foo();
receiver.var();

You can write:

with receiver {
    foo();
    bar();
}

I'd be very careful before using the with keyword for anything in rust.

Havvy commented 8 years ago

I'd rather see a procedural macro or lint plugin for this.

fn foo() {
  let mut a = something();
  let mut b = something_else();

  #[with(a)] {
    a = a.derived();
  }

  b.associate_with(a)
}

There's no need for this to be a part of the core language. Especially given its optionality.

Centril commented 6 years ago

Triage ping @leksak -- what's the status of this issue?