puffnfresh / roy

Small functional language that compiles to JavaScript.
http://roy.brianmckenna.org/
MIT License
834 stars 74 forks source link

Tail Call Optimization with Brushtail #187

Open rtfeldman opened 10 years ago

rtfeldman commented 10 years ago

This is not ready to be merged yet; I hit a snare and would like to discuss.

Consider the following code, generated from fixtures/good/tco.roy

var gcd = function (a, b) {
    return function () {
        if (b == 0) {
            return a;
        } else {
            return gcd(b, a % b);
        }
    }();
};

Brushtail won't optimize this because the anonymous function wrapper means the recursive call to gcd is not actually in the tail position.

Naturally, without that wrapper...

var gcd = function (a, b) {
        if (b == 0) {
            return a;
        } else {
            return gcd(b, a % b);
        }
};

...it gets optimized as expected.

I haven't taken a look at where this anonymous function wrapper is coming from, but I assume it is being added for scoping purposes. Obviously there are plenty of cases where wrapping for scoping makes sense, but I can't think of any where wrapping the entirety of a function body like this could impact scoping.

Am I missing something, or is this wrapper just something superfluous we should do away with in order to enable TCO?

joneshf commented 10 years ago

It's being wrapped because it's a conditional statement. https://github.com/puffnfresh/roy/blob/master/src/compile.js#L175-L220

I forget the actual reason why conditionals are wrapped, but it allows you to use them as expressions, as you did there. Unless that is the whole reason.

rtfeldman commented 10 years ago

Ah, makes sense. So my assumption that it was for scoping was incorrect.

In this particular case, the return statements inside the if and the else happen to make it work as an expression with or without the wrapper function. But I can easily see where that wouldn't generalize; if I'd been assigning the result to a variable, the wrapper function would be a clean way to do that:

var result = function () {
        if (a > 0) {
            return a;
        } else {
            return b;
        }
    }();

Two alternative ways to generate JS for an if expression that don't involve a wrapper function are:

// Alternative #1: Reassignment
var result;

if (a > 0) {
    result = a;
} else {
    result = b;
}

// Alternative #2: Ternary
var result = (a > 0) ? a : b;

Based on some quick experimentation, CoffeeScript uses the ternary approach for cases like these.

Benefits of the ternary over the function-wrapping approach include that it's better for performance because it's lower-overhead, it's TCO-friendly, and it's arguably clearer - the ternary is an actual conditional expression, whereas wrapping in a function and using return is a roundabout way to achieve the same.

Drawbacks include that it can generate JS that's difficult to read when the conditionals have a lot going on in them (statements are comma-separated, etc.), and that it doesn't offer any more fine-grained control over scope. (Which, to be fair, I'm not sure we're actually using anyway.)

As an aside, it appears Brushtail doesn't TCO ternaries at present, so even assuming they are what we want, switching wouldn't be a drop-in fix just yet.

puffnfresh commented 10 years ago

Yeah, the wrapping is awful. The wrapping is to ensure new scoping for things like:

let x = if 2 > 0 then
  let x = "ok"
  x
else
  let x = "fail"
  x

Which becomes:

var x = function () {
        if (2 > 0) {
            var x = 'ok';
            return x;
        } else {
            var x = 'fail';
            return x;
        }
    }();

It'd always be better to emit a ternary when there's only a single statement. That was an optimisation I've always wanted but didn't get around to.

But it might even be better to always emit ternary expressions:

var x = 2 > 0 ? function() {
        var x = 'ok';
        return x;
    }() : function() {
        var x = 'fail';
        return x;
    }();

Undecided, but I do like:

let x = if 2 > 0 then
  "ok"
else
  let x = "fail"
  x

Becoming:

var x = 2 > 0 ? 'ok' : function() {
        var x = 'fail';
        return x;
    }();

So maybe we should just run with that. I'll be able to add ternary support to Brushtail fairly easily.

What do you think?

joneshf commented 10 years ago

I think either way is fine. In my mind it doesn't matter what the js looks like when it comes out, but wasn't one of the original ideas with roy to emit js that a normal person would write? I'm not sure how many people would write a ternary like that.

EDIT: just saw this, so do we favor ternary in this case?

cassiemeharry commented 10 years ago

Undecided, but I do like:

let x = if 2 > 0 then
  "ok"
else
  let x = "fail"
  x

Becoming:

var x = 2 > 0 ? 'ok' : function() {
        var x = 'fail';
        return x;
    }();

I've implemented this in 01cd6b3.

puffnfresh commented 10 years ago

@nickmeharry brilliant. Thanks heaps.

puffnfresh commented 10 years ago

@joneshf the main rationales for the "readable JS" is:

I don't think the ternary causes much of a problem.

rtfeldman commented 10 years ago

Nice! Thanks for all the great updates. I also think the ternary will be fine in terms of readable output, especially considering CoffeeScript uses it. Certainly many people are happy with the readability level of CoffeeScript's output.

One last thing to note:

var x = 2 > 0 ? 'ok' : function() {
        var x = 'fail';
        return x;
    }();

This ternary approach permits TCO in Roy when there is no let involved in a conditional, but supposing the else branch in this snippet returned a recursive call instead of x, that function wrapper would still prevent Brushtail from recognizing the call as being in the tail position.

I think the best way to handle this case would be on the Brushtail side. If Brushtail could consider zero-arg anonymous functions that are immediately invoked as "harmless" (not impacting whether a call is in the tail position), and then migrate them during rewriting (to preserve the scoping properties), then I believe we'd have our bases covered on TCO for Roy.

non commented 10 years ago

(Disclaimer: please feel free to ignore me on this)

Given that you have your own parser/compiler, I wonder if changing inner names to avoid collisions isn't something you want to be able to support? And if it is, then I could imagine wanting to avoid allocating closures, in which case Richard's first suggestion seems a bit nicer. For instance:

let x = if 2 > 0 then
  let x = "ok"
  x
else
  let x = "fail"
  x

just becomes:

var x;
if (2 > 0) {
  var x_ = "ok"
  x = x_
} else {
  var x_ = "fail"
  x = x_
}

Of course you can substitute whatever name-mangling scheme you want... adding trailing underscores just seemed easy. To me it seems like this code will hew closer to what a casual user might expect. The x = x_ line looks a bit weird, but then again so does the let in the earlier example.

rtfeldman commented 10 years ago

Good points!

If doing a gensym-like approach to naming shadowed variables is on the table, we can still use ternaries as long as we put the var declarations up front, which is how CoffeeScript handles it:

var x, x_;

(2 > 0) ?
  (x_ = "ok", x_) :
  (x_ = "fail", x_)

I don't know if there's been much discussion on whether name collisions should be okay. I tentatively like the idea of prohibiting them, especially if let foo = is going to be replaced with the more concise foo =. For example:

x = if 2 > 0 then
  x = "ok"
  x
else
  x = "fail"
  x

This looks very much like reassignment, even though it's really (intended to be) shadowing. Prohibiting this would be less flexible, but would simplify reading Roy code.

Either way, I like the idea of sticking with ternaries while putting the necessary var declarations before the conditional in order to avoid anonymous function wrappers.

This seems like the best solution to the TCO roadblock proposed so far.