wren-lang / wren

The Wren Programming Language. Wren is a small, fast, class-based concurrent scripting language.
http://wren.io
MIT License
6.86k stars 550 forks source link

Proposal: Top level names #106

Closed munificent closed 9 years ago

munificent commented 9 years ago

(Note: I'm writing this in future tense not to imply that this will happen—it's still a proposal—but just to keep the text simpler.)

OK, here's my proposal for rationalizing the scope issues in #101. The short summary is that we'll borrow Ruby's "constant" syntax.

Right now, Wren has three flavors of identifiers:

  1. Identifiers that start with _ are instance fields.
  2. Identifiers that start with __ are static fields.
  3. Everything else is a "normal" name.

We'll add one more category: identifiers that start with a capital letter are "top level" names. I think the naming rule works pretty naturally for classes, and also for constants if we use an ALL_CAPS convention to name them.

The scoping rules for normal and top level names are different:

Normal names

To resolve a normal identifier that starts with a lowercase name, we start with the current innermost lexical scope. If it's a local variable there, we're done. Otherwise, we walk up the enclosing scopes looking for it in them.

If we hit the edge of a method boundary, we stop there. If it hasn't been found by then, then the name is interpreted to be a getter on this. This is different from the current behavior where a name is only an implicit getter if it isn't defined in any lexically enclosing scope, including ones outside the method. What that means is that this program:

var foo = "top level"

class Bar {
  foo { "instance" }
  method {
    IO.print(foo)
  }
}

(new Bar).method

Will print "instance". This is what Ruby does, and is the behavior I prefer. For what it's worth, Gilad Bracha disagrees (pdf). (In Wren today, this program prints "top level".)

This does mean that you can't access an outer variable with a normal name from inside a method. I think that's an acceptable limitation. Instead of a variable, you can usually just move it to a static field. If this is really a problem, we could add some syntax to express "don't look up this name on this".

Top level names

To resolve a top-level name, we walk the enclosing scopes all the way to the top, ignoring method boundaries. If we find the name in any of those scopes, we're done.

Otherwise, we speculatively assume it will be defined at the top level and resolve it as such. The compiler will keep track of which top-level names have been implicitly declared by this. After it reaches the end of the file, any implicitly declared names that never got a matching definition generate compile errors.

As long as the name gets defined before it's used, everything is fine. This makes mutual recursion like this work:

class Foo {
  bar { new Bar }
}

class Bar {
  foo { new Foo }
}

(new Bar).foo
(new Foo).bar

Since Bar and Foo are both inside methods that aren't called until after Foo and Bar are both defined, no error occurs. If you do try to use a top level variable before it's defined, it's a runtime error:

IO.print(Foo) // runtime error

var Foo = "defined"

Note that "top level" names can still be defined in local scopes and used there:

{
  var TopLevel = "ok"
}

In that case, they go out of scope like any other local variable. We may need a better name for these kinds of identifiers than "top level".

Likewise, you can use capitalized names as getter names:

class Foo {
  Bar { "ok" }
}

It's just that you have to have an explicit receiver to invoke it. Bar will never work call that, but someFoo.Bar is fine. This means this rule still works if we use class objects as modules that want to contain multiple classes (#79).

I've mulled this over for a while, and I'm pretty happy with it. The fact that it lines up with Ruby gives me some confidence that it's usable in practice, even in large programs.

Thoughts?

kmarekspartz commented 9 years ago

:+1:

edsrzf commented 9 years ago

This seems a little weird to me from a design standpoint, but it also seems like idiomatic code would Just Work with these rules. Overall, I like it.

I wouldn't mind seeing a further distinction by saying that "top level" names cannot be reassigned. (In other words, they're like Java's final.) I don't like that class names can be reassigned. It would help people decide when to use what type of name and could be useful in certain compiler optimizations. And finally, it would resolve the naming issue. We'd just call them "constants."

It would also become difficult to have useful mutable top-level variables, which could be both good and bad. You could always work around it by putting static variables in a top-level class or similar.

munificent commented 9 years ago

I wouldn't mind seeing a further distinction by saying that "top level" names cannot be reassigned.

I thought about that too. I'm not deeply opposed to doing this, but I figured I'd leave this out at first in the name of minimalism. I'm trying to cut as many non-essential features as I can.

In particular, Wren doesn't have many features that exist purely on philosophical "it's easier to write maintainable code because of this" grounds. It's not that I don't like those kind of features, it's just that they don't increase the number of things a user can do, so I can sacrifice them without loss of expressiveness.

It would help people decide when to use what type of name and could be useful in certain compiler optimizations.

I don't think we'd get much optimization out of it. Before we could take advantage of this, we'd need a much more complex compiler pipeline in general. If we had that pipeline, we could automatically tell which variables never get assigned to, so it wouldn't need to be indicated in the source to begin with.

I think single-assignability is basically just for the end user.

And finally, it would resolve the naming issue. We'd just call them "constants."

There is that! :) I was thinking maybe "local" and "nonlocal" names. What do you think?

It would also become difficult to have useful mutable top-level variables, which could be both good and bad.

You can still declare lowercase names at the top level, and I think doing so will be common. It's definitely useful in the REPL and it's handy for scripts that have some top level code. The only limitation is that those lowercase top level names aren't visible inside classes. They're usable inside other imperative code at the top level.

edsrzf commented 9 years ago

I figured I'd leave this out at first in the name of minimalism.

That's fair.

I don't think we'd get much optimization out of it.

There are some optimizations available even in a single-pass compiler. For example, zero-overhead constants.

var PI = 3.14159

class Circle {
  circumference {
    // This line can compile exactly like:
    // return 2*3.14159*_r
    return 2*PI*_r
  }
}

This also makes it possible to constant fold the 2*PI. Although if the class is defined before the constant, none of this can happen easily, so that's a little weird.

Another example is the expression x is Num. Since the compiler knows that Num always refers to the built-in Num class (unless it's been shadowed, which the compiler would know about), it could have a special bytecode just for this type of check. (It might not be worth doing, but it becomes an option.)

munificent commented 9 years ago

Yeah, I guess we might be able to do constant folding. I wouldn't rule that out at some point, though I'm not sure if it's worth the effort. We could only constant fold trivial expressions involving known operations primitive types.

In the rare places where that is a bottleneck, it's easy for the user to just do that manually:

var TWO_PI = PI * 2

Not that I'm advocating making the user manually do microoptimizations like this. But if it's a reasonably simple thing that the user can handle at their end, I think it's reasonable to keep the language simpler. :)