Fix implicit loop closure, support `do (x) ->`

TrevorBurnham commented 13 years ago

This is an attempt to tie together a few recent issues and come to a consensus on the current feature of closing-over variables in a loop.

The purpose of the feature

Originally proposed in issue 423, the feature is nicely illustrated by this test case:

for i in [1, 2]
  setTimeout (-> console.log i), 0

If converted straightforwardly to JavaScript, this will generate the output

2
2

Why? Because the closure doesn't get invoked until after the loop has finished executing, by which point i is 2. The closures always references the same i.

But because of this feature, you get the output

1
2

Now the closure references the loop's indices (and other variables with "loop scope"—see issue 948) as they are when the closure is defined. This is implemented using a function called from each loop step:

var i, _fn, _i, _len, _ref;
_ref = [1, 2];
_fn = function(i) {
  setTimeout((function() {
    return console.log(i);
  }), 0);
};
for (_i = 0, _len = _ref.length; _i < _len; _i++) {
  i = _ref[_i];
  _fn(i);
}

Inconsistencies resulting from this feature

Unfortunately, there are several problems with this implementation, most notably issue 954: The transformation doesn't work if there's a break, continue or return anywhere in the loop. So the code

for i in [1, 2]
  setTimeout (-> console.log i), 0
  return if false

gives you

2
2

This is highly unintuitive: Why would the addition of a control statement affect the code's output?

Another inconsistency, issue 950, is that loop indices are immutable when this transformation occurs (that is, when there's a closure and no control statements in the loop), but are mutable otherwise.

Yet another inconsistency, pointed out by satyr at issue 728, is that arguments won't work within the for loop if the transformation occurs: It will reference the arguments of _fn rather than the arguments of whatever function the for loop exists in.

A more sophisticated implementation could correct these problems, but it would not be simple or elegant, and it's certainly not going to happen in the next few days.

Philosophical problems with this feature

Even if it weren't for the inconsistencies pointed out above, would we want this feature? It's a pretty dramatic transformation on the CoffeeScript compiler's part, and makes an unintuitive leap from JavaScript's semantics.

While it may be the case that you usually want a closure in a for loop to reference surrounding variables as they were at the time the closure was declined, this isn't the way that closures behave anywhere else in the language.

One of my favorite things about JavaScript is that only functions have scope. It's a nice, easy, consistent rule, one that lets you easily determine scope in CoffeeScript by looking for -> and => (and class). There have been several attempts at giving loops special scope in CoffeeScript, but all have proven problematic. I think loops should be kept simple.

Alternatives

Let's look at that original example again. How do we say "let's preserve the value of i when the closure is invoked"? The simplest approach is to write our own explicit closure,

for i in [1, 2]
  ((i) ->
    setTimeout (-> console.log i), 0
  )(i)

Of course, the syntax here is terrible. There was a better syntax that was proposed at issue 788 and had fairly wide support, not to mention a working implementation courtesy of satyr. If a proposed variation on this syntax were added to master, then you could write

for i in [1, 2]
  do (i) ->
    setTimeout (-> console.log i), 0

where do (i) -> ... is shorthand for do (i = i) -> ..., which in turn is shorthand for the current ((i) -> ...)(i).

This, I believe, is the perfect solution: A simple, succinct way of explicitly stating which variables you want to capture. This is a much better fit with CoffeeScript's philosophy of letting you write syntactically elegant code with JavaScript's semantics.

Conclusion

The implicit loop closure that's been in CoffeeScript since issue 423 should be removed due to the unpleasant surprises listed above. Even a perfect implementation would not, in my view, be worth the increased distance between the CoffeeScript and the underlying JavaScript. And the do keyword, rejected as insufficiently practical, should be reconsidered as an elegant way of capturing variables.

geraldalewis commented 13 years ago

Nice writeup, Trevor. I was having trouble teasing apart the issue before this.

I'm in favor of the alternative approach you outlined. I'm comfortable with my control structures not closing over my variables, as I'm sure many other JS devs are, and I'd be more confused by the existing behavior. There being a syntactic way to semi-simulate block level scope seems like the best of both worlds.

I'm curious, though: why overload the do keyword? It's a pretty, two-letter word, but maybe doesn't communicate its intent clearly.

TrevorBurnham commented 13 years ago

@geraldalewis I don't want to get too deep into the issue 788 discussion, but in my opinion, do (which is reserved but unused by CoffeeScript) is a pretty readable way of saying "run the following function, passing in these arguments." Maybe run or call would better communicate the intent than do, but that would mean reserving new keywords.

The full form is do (x = value) ->, but do (x) -> is shorthand for do (x = x) ->; that's how do (i) -> above, both declares a function argument named i and passes in i to the function.

geraldalewis commented 13 years ago

@TrevorBurnham - ahh -- I'd missed that do was unused in CoffeeScript; I don't use do regularly. Makes total sense now. Thanks for the explanation :)

satyr commented 13 years ago

passing in these arguments

This isn't a part of the proposal (nor the current implementation in Coco).

TrevorBurnham commented 13 years ago

@satyr Any particular reason? I'd suggested the shorthand at issue 788.

satyr commented 13 years ago

Any particular reason?

The purpose of do in that proposal is to write less parentheses--nothing more.

TrevorBurnham commented 13 years ago

@satyr Thanks, I've clarified the use of do in the original post and raised a Coco issue. (Update: And now that Jeremy has added your implementation of do to master, I've copied that issue over to CoffeeScript as issue 960.)

odf commented 13 years ago

I don't have any philosophical problems whatsoever with introducing block scope. In my eyes, it's less of a deviation from Javascript then the compulsive list comprehensions. But since obviously it's really hard to get this right, I agree that it would be better to drop the feature in favour of simplicity.

I think I like the idea of the do statement, which reminds me of the let statement in Lisp, ML and probably a bunch of other languages. It clearly communicates the idea that a new scope is created with some bindings and then a block of code is executed within that scope.

I'm wondering about the best syntax, though. Should the arrow be mandatory? What about the argument list. For example, one could imagine for i in [1..3] do setTimeout (-> console.log i), 0 to be shorthand for for i in [1..3] do (i = i) -> setTimeout (-> console.log i), 0 which in turn would expand to for i in [1..3](%28i%29 -> setTimeout %28-> console.log i%29, 0)(i) Is this too confusing, going too far? Maybe the arrow should be mandatory in order to indicate that a function is created. But the special form for i in [1..3] do ->, in which do would expand to do(i = i) as a more explicit way of getting the current behaviour back might be useful.

Another question I'm asking myself is whether to use let as the keyword instead of do, which would immediately be familiar to Lispers. I don't see why do has to be tagged indefinitely as a keyword if it's not used, so the argument about not wanting to introduce another keyword does not quite convince me.

TrevorBurnham commented 13 years ago

@odf Good points. I think that the proposed for...do, while appealingly succinct, suffers because it would make break, continue and return fail to act as you'd expect them to in a loop setting (not to mention arguments). By making the variable capture construct an explicit function, you can continue to interact with the loop in ordinary ways outside of it.

As to let, I'm not a Lisper myself, but do f as a synonym for f() makes more sense to me than let f... besides, do has to be tagged as a keyword because it's not a valid variable name in JavaScript.

odf commented 13 years ago

@TrevorBurnham Yes, I think I agree that the for...do form would be too confusing. But what about for...do ->? I'm not sure myself, honestly, but it does have some appeal.

I didn't realise that about do in JavaScript, but I still don't feel the fact that it's already tagged makes for a strong argument. I think it's a good word for expressing what it would do in this proposal, so I'm all for it. On the other hand, it might confuse people coming from languages where do is just syntactic glue introducing a loop body.

TrevorBurnham commented 13 years ago

Yes, I suppose for ... do -> would be fine, with do -> being shorthand for do (i) or do (key, val) or whatever.

It's true that do is almost always paired with until, but I don't see that as a big problem. I guess I wouldn't really care if let were used instead.

jashkenas commented 13 years ago

Thanks for detailing this issue, Trevor. On master, the current situation is this:

The implicit-closure magic is gone, and loops are just loops, regardless of if they're generating functions or not.
There's a new syntax for "scoped functions", that act like the closure-wrapped functions, or a forEach loop. You write it like for variables in object -> For example, one of the test cases:
```
obj = {}

for method in ['one', 'two', 'three'] ->
obj[method] = ->
  "I'm " + method

ok obj.one()   is "I'm one"
ok obj.two()   is "I'm two"
ok obj.three() is "I'm three"
```

TrevorBurnham commented 13 years ago

I'm fine with the scoped for syntax, as an extra sugar for a common use case (and I'm glad it uses an explicit ->), but I really, really want the enhanced do syntax. Soooo badly. Because sometimes, you want to explicitly enclose some variables but not other.

The do syntax I suggested at issue 960 is so much more readable than the mess of parentheses currently required to define and call a closure in-place. It's simple, versatile, and free of pitfalls.

odf commented 13 years ago

I like the for...-> very much. Explicit is better than implicit, as they say in Python land.

Like TrevorBurnham, I'm still very much in favour of the proposed do statement, tough. It's a clean syntactical encapsulation of a common use-case that is otherwise tedious to express. Having the do would make code that needs to close over certain values but not others DRYer and more readable.

jashkenas commented 13 years ago

I've talked about this a bit with Trevor out-of-band, but it's currently my opinion that loops "that need to close over certain values but not others" are an anti-pattern. Those loops can always be written in a clearer, more structured way, and it's pretty nasty to have to check a declaration to determine why some variables are being shared across iterations, and some variables aren't. If we can keep you from having to bother dealing with that whole can of worms, in CoffeeScript, I think we'd be better off in the end.

odf commented 13 years ago

The do proposal is not just about loops. The new for...-> syntax solves an important special case, but it's not the whole story. Ad-hoc functions (defined once and called immediately) that return inner functions closing over local variables are an invaluable data-encapsulation tool in JavaScript. But it's a tedious and not very readable idiom, which is why I think special syntax for it would be a great idea.

Anyway, here's what we can do right now, without implicit loop scoping or the new for...-> syntax: scope = (args, f) -> f(args...)

for i in ['one', 'two', 'three']
  scope [i], (i) ->
    setTimeout (->
      stateWinner =
        if i == final_i then "the winner" else "but #{final_i} won"
      console.log "I am #{i}, #{stateWinner}"
    ), 0
final_i = i

Note that I introduce a new scope only where I want it. I can still do anything I want in the rest of the loop, including calling pure statements and accessing arguments. This is already pretty sweet. But I'd like it even more if instead of scope [a,b,c] (a, b, d) -> I could simply write do (a, b, d = c) ->. It doesn't save a lot of typing, but it is much clearer, avoids repetition (which I consider an anti-pattern in itself), and turns a somewhat obscure-looking idiom into a first-class language feature.

(Edit: Moved the final_i assignment out of the loop body.)

michaelficarra commented 13 years ago

So, running with the wrapped return value proposal, I present a more formal proposal, one that I believe is actually the "perfect" solution Trevor mentions above. The solution currently on master makes the user think about when they need to closure-wrap their loops, and we shouldn't make them have to do that.

Take the following contrived example showing all proposed features: a function definition, a return, a continue, and a break all within the loop body.

fns = []
for k, v of obj
  continue if k is 0
  break if arguments.length is 1
  fns.push (a) ->
    console.log k, v, arguments
    return k
  return k if k is 2

This example should compile to

var fns, k, v, __fn, __result,
    __return = {}, __break = {};
fns = [];
__fn = function(__args, k, v) {
  if (k === 0) return;
  if (__args.length === 1) return __break;
  fns.push(function(a) {
    console.log(k, v, arguments);
    return k;
  });
  if (k === 2) return [__return, k];
};
for (k in obj) {
  __result = __fn.call(this, arguments, k, obj[k]);
  if(!__result) continue;
  if(__result === __break) break;
  if(__result[0] === __return) return __result[1];
}

Remember that this is the most complex example. Simpler function definitions will produce simpler JS output. See the next examples below.

fns = []
for k, v of obj
  continue if k is 0
  fns.push (a) ->
    console.log k, v

This example should compile to

var fns, k, v, __fn;
fns = [];
__fn = function(k, v) {
  if (k === 0) return;
  fns.push(function(a) {
    console.log(k, v);
  });
}
for (k in obj) {
  __fn.call(this, k, obj[k]);
}

Features/optimizations that you may have already noticed:

the loop only checks the return value for pure statements that were actually mentioned (including no checks at all if no pure statements are used)
continues don't even have to be handled at all, ever
__return and __break only have to be defined if they are being used in a closure-wrapped loop body
arguments doesn't have to be passed (as __args) if arguments is not mentioned
only have to pass-through the loop variables that are mentioned in the loop body

This proposed transformation process is completely transparent to the user, and will provide them with consistent, expected values in the closure-wrapped variables. Some reasoning as to why this is the best choice for semantics, taken from #coffeescript:

<jashkenas> #423 is a good read.
<jashkenas> To quote sethaurus:
<jashkenas> "I think closing over loop variables is a great idea, and testing for function literals in the loop body would be a solid way to decide whether or not to capture them.
<jashkenas> Is there any situation where one might define a function inside a loop, but reasonably rely on the loop variables not being captured?"
<michaelficarra> That's a good point, is there? Probably not.
<jashkenas> For the record, reading those tickets make me think that being inconsistent here is the lesser of two evils.
<michaelficarra> I don't know if we fully explored the special return types approach
<michaelficarra> maybe I'll come up with a more formal proposal tonight
<jashkenas> firing ajax callbacks from an array of modified items, reading an array of files with Node.js async callbacks, mixing in method implementations from a third-party library.
<jashkenas> Really, most every time you put a function definition in a loop, it's because you want to close over something.
<jashkenas> Otherwise, the function can be written outside, and then you get your desired plain for loop behavior.
<jashkenas> Perhaps this just needs to be well documented, instead of papered over.
<michaelficarra> Yes, exactly. If it's defined inside, it must be the developer's intention to close over the loop variables, otherwise it would be defined outside

PS: For the inconsistency with mutability of the loop variables, see my still-valid solution on #950.

odf commented 13 years ago

Render me impressed.

Just one question, though: if this is implemented, would not users expect the bodies of if..then...else, switch...when...then statements etc. to implicitly create their own scopes in a similar way?

I think the consistent way to go about this would be to either make the language fully block-scoped or else leave it function-scoped the way JavaScript is, and introduce a convenient, readable way of introducing a new scope. The latter approach is obviously much easier to implement, and stays closer to the target language. If the generic do doesn't happen, I think the new for...-> loops are a good compromise.

michaelficarra commented 13 years ago

@odf: Can you give me an example of how a user would see different scope-related results when using an if...then...else than they would expect with knowledge of how the for loops work in my proposal? Not attacking your argument, just want to know exactly what you mean.

odf commented 13 years ago

That's a fair question. Here is some silly example code: f1 = (n) -> g = if n > 0 m = n / 2 -> m else -> 0 m = 2 g

scope = (f) -> f()

f2 = (n) ->
  g = if n > 0 then scope ->
    m = n / 2
    -> m
  else scope ->
    -> 0
  m = 2
  g

console.log f1(3)()
console.log f2(3)()

Because of the default function-scoping, the first console.log statement prints 2, which is the value of m when the function returned by f1 is executed. But with the implicitly wrapped loops, if a user wrote m = n / 2 -> m within a loop value, they would see the function in the second line close over the local value of m, and probably expect the same behaviour here. In other words, they would expect f1 to behave like f2, which returns a function that returns 1.5.

TrevorBurnham commented 13 years ago

Michael, I'd be OK with an implementation of for that always closes over loop variables without introducing inconsistencies, and with making loop indices consistently immutable. But neither of those changes is going to be implemented in the next 2 days. I think you should maka a separate issue for post-1.0 discussion of your excellent proposals.

Going back to do, I still say it isn't just about for loop variables, that there needs to be a succinct syntax for explicitly capturing variables anywhere.

Suppose that every loop iteration I get a new x; the loop ends when the sum of the x's goes over a certain limit. I need to asynchronously get an encryption key to save each x along with the sum to that point. With current structures, this would have to be written as

while sum < limit
  sum += x = nextNum()
  saveXAndSum = (x, sum) ->
    getEncryptionKey (key) ->
      saveEncrypted key, x, sum
  saveXAndSum x, sum

There are three problems here: Non-linearity (defining a function right before the only time it's called), repetition, and having to name a pretty self-explanatory 2-line function. With do, the code can be made much nicer:

while sum < limit
  sum += x = nextNum()
  do (x, sum) ->
    getEncryptionKey (key) ->
      saveEncrypted key, x, sum

Surely this is both more readable and more writeable?

michaelficarra commented 13 years ago

@TrevorBurnham: A few things:

Not all loops would be closed over. Just those with function definitions. See the conversation from #coffeescript that I included in my post for why this is (at least one of) the correct trigger(s).
I wouldn't be so quick to say this can't get done in the next 2 days. I'm planning on working on this all day today, showing the results to jashkenas tonight (as he requested on #coffeescript), and possibly merging it in before tomorrow.
In your example above, you can just do the following when you're only using the function in one place.

while sum < limit sum += x = nextNum() ((x, sum) -> getEncryptionKey (key) -> saveEncrypted key, x, sum )(x, sum)

I agree it is a little ugly, but do really is unnecessary, as jashkenas keeps saying.

TrevorBurnham commented 13 years ago

Michael, if you agree that people might have reason to write

((x, sum) ->
  getEncryptionKey (key) ->
    saveEncrypted key, x, sum
)(x, sum)

then surely you can see why allowing

do (x, sum) ->
  getEncryptionKey (key) ->
    saveEncrypted key, x, sum

would be an improvement? Imagine if you had two or three such nested structures!

michaelficarra commented 13 years ago

Trevor, it's an alright feature, and I'm not vehemently opposed to it, but I do think it's a little unnecessary. I'd lean slightly toward not including it if given the option. But only slightly. I'd like to see some cases where it really cleans up some real-world code. I think it'll be hard to find such an instance.

satyr commented 13 years ago

The do proposal is about beautifying common code pieces, in the same way that we allow braceless objects. If we can pay great effort to strip 2 bytes from {k:v}, why not paying a little more to do the same for (->)()?

jashkenas commented 13 years ago

I think that Trevor's example is a great place that demonstrates how using more structured code helps avoid having to worry about variable scoping bugs:

# Save a sum to an encrypted account.
saveSum = (x, sum) ->
  getEncryptionKey (key) ->
    saveEncrypted key, x, sum

# Save sums in increments, up to the limit.
while sum < limit
  sum += x = nextNum()
  saveSum x, sum

odf commented 13 years ago

@jashkenas I think this is a great demonstration of how more 'structured' code can actually be harder to read. The name saveSum is not self-explanatory, whereas the actual code, which is not that much longer than the function call, is. I don't think programmers should be forced to either use an unreadable idiom or else tear their code apart into tiny pieces with arbitrary names stuck onto them. It should be up to them to decide when a function body is getting too long or complicated, so that it's better to decompose it into more elementary parts.

TrevorBurnham commented 13 years ago

Jeremy, while I appreciate your efforts to promote a clear, consistent coding style, I think that the form you propose—even if better in this case (and me and odf would disagree)—becomes untenable in heavily asynchronous code where there are several layers of nested callbacks.

Let me extend my example slightly, by saying that we need to save x and sum to a set of servers that come from an opaque iteration object, dependent on x. Then with do, the code could be written as

while sum < limit
  sum += x = nextNum()
  do (x, sum) ->
    while s = nextServer(x)
      do (s) ->
        getEncryptionKey s, (key) ->
          saveToRemote s, key, x, sum

By organizing the code in this way, it's not only nicely succinct (which is a great benefit to writeability—I always feel slowed down by having to name functions whose code is self-explanatory), but also makes the scope of everything quite obvious. From the saveToRemote line, you need only look up to the nearest (x) -> to see the scope of x.

Now here's the "proper" way to write this without do:

# Save a sum to an encrypted account
saveToServer (s, x, sum) ->
  getEncryptionKey s, (key) ->
    saveToRemote s, key, x, sum

# Invoke saveToServer for all relevant accounts
saveToAllServers = (x, sum) ->
  while s = nextServer(x)
    saveToServer s, x, sum

# Save sums in increments, up to the limit
while sum < limit
  sum += x = nextNum()
  saveToAllServers x, sum

We've doubled the line count, and in my view, we've made the code much more daunting. We've defined things in the opposite order of when they happen, which makes sense when a function is called from several places, but not when it's particular to a single loop. Also, with each level of nesting, we get more and more arguments to each function. I've had functions with 5-6 levels of callback nesting, each adding more variables, which makes for some very messy function declarations.

Granted, all the structure added by defining functions outside of the loop makes scope easy to keep track of, but at a very heavy price in all the other virtues that make CoffeeScript such a grand language.

In closing, Jeremy, all I want for Christmas is do.

odf commented 13 years ago

Nice example, Trevor! I was talking more in terms of generalities, by the way. How I'd write a piece of particular code depends a lot on the context, so I wouldn't dare to decide what's more readable just based on a tiny snipped. The point was simply that with a small piece of code like that, it is often not the best option to tear it apart artificially. But I think your new example demonstrates that much better.

By the way, I've noticed that there appears to be a let statement quite similar to what you've proposed for do in more recent versions of Javascript. But the various implementations and proposed standards confuse me, so I have no idea if this has chances of being in the mainstream five years from now or not.

TrevorBurnham commented 13 years ago

Yes, the proposed let is very similar, and I think its popularity among the ECMAScript Harmony crowd is further evidence for the usefulness of do. Sure, we could wait for let to become a part of every target environment—but it's going to be a long, long wait before everyone's on IE10+...

odf commented 13 years ago

No, I wasn't proposing to wait for that. More to suggest it as evidence that this construct could be useful, after all.

jashkenas commented 13 years ago

Trevor, looks like you get your wish. SHA: 094b876a384bed5504f513c472e4c9cfebe56aea

I'm not entirely happy with the verbosity do requires ... but it really is the more general solution. It makes properly scoped comprehensions an orthogonal issue that we can work out at our leisure. So, now on master, we have:

for fileName in list
  do (fileName) ->
    fs.readFile fileName, (err, contents) ->
      compile fileName, contents.toString()

And:

do walk

... the reason for removing scoped loop literals is that they didn't really cover all of the possible cases, and do does.

TrevorBurnham commented 13 years ago

Thanks, Jeremy, I'm glad to hear it. One small tweak suggestion: How about making a special case in the parser to allow

for fileName in list do (fileName) ->

rather than requiring then or extra indentation there? That'll slightly mitigate the verbosity for a common use case.

Also, for the record, what cases did the auto-scoped for proposed by Michael fail to cover? Or were you referring to the while/until examples I suggested? The only inconsistency in the proposed super-for described above that I could spot was that

lastNonNull = (list) ->
  result ?= x for x in list
  return result

would continue to work, but it would fail if the loop contained a closure, since result would have loop scope.

jashkenas commented 13 years ago

The cases that the auto-scoped for failed to cover were while, until and loop, naturally, and it didn't look like it would be easy to add those to the grammar. The bigger issue is that do is more general, and more orthogonal to if we decide to properly block scope loops later on ... whereas scoped loops would have been made obsolete immediately.

We may make a special case in the parser later, but for now, let's stick with then.

marcandre commented 13 years ago

At last I'm starting to use coffeescript (and really like it). As a Rubyist, I was naively surprised that coffeescript does not create a closure on for (anymore).

Just add my "+1" for the simpler notation "for... ->". I read the whole thread and understand why the "do" was introduced (good idea), but I would really like an even shorter notation, in particular one that doesn't require me to repeat my parameters... Personally, I would close all my fors.

Thanks!

Thanks

nickretallack commented 13 years ago

When I noticed that the current version of coffeescript doesn't make a new scope for the loop variable, I wanted to make an issue about it, but it looks like this has been discussed to death already. It didn't occur to me that this would alter the behavior of return/break/continue, and that alone is a good reason my intuition is wrong here.

I think the real solution, if you want a closure for your loop variables, is to use map or each, which are provided by underscore.js in case your environment doesn't already have them. If you need to exit early (break) you may be better off using detect, or you could use an exception.

Also, I was shocked when I found that Python has the same problem here. I thought Python was usually a step above Ruby in terms of technical correctness, but Ruby 1.9 actually does allow loop variables to shadow outer scopes. Then again, Ruby loop bodies really are functions, so perhaps it's not a fair comparison.

jashkenas / coffeescript