masak / alma

ALgoloid with MAcros -- a language with Algol-family syntax where macros take center stage
Artistic License 2.0
137 stars 14 forks source link

Consider switching to Dylan's generic functions #490

Open masak opened 5 years ago

masak commented 5 years ago

I took a closer look at Dylan last week, especially getting all the way through this tutorial (which I found great, especially as it relates Dylan to other languages; although I also noticed the complete lack of macros in that tutorial).

Dylan has many attractive, interesting features that we're not going to emulate in 007. One of them is that type names are written like <string> (where 007 writes it Str currently). Yes, a whole lot more characters are allowed in Dylan identifiers (alphanumerics but also ! & * < = > | ^ $ % @ _ - + ~ ? /). I ascribe this to Dylan starting out as a "sexpr language", which also allows a whole range of characters in symbol names. In 007, we're not going to do that, because it'll interfere too much with operator syntax and syntax for more exotic user-defined things. Besides, Str (upper camel case) for type names is some sort of entrenched convention, and 007 tries to not break those.

But one place where I'm considering following Dylan's example is this: Dylan class declarations do not declare methods. Instead, methods are defined outside of a class, either directly in connection with it, or (more interestingly) by anyone.

Basically, MRO-style dispatch doesn't exist in Dylan's world. Instead, something very much like Perl 6's multisub dispatch is used. Perhaps one way to describe it is that "everything is funcs".

More precisely, in this world, several declarations of the same function name of are allowed in the same scope. Whenever more than one function of the same name is present, a coordinating entity (a "generic function") forms implicitly, whose job it is to dispatch to the right "actual" function. (Dylan calls these "methods", arguably not with the same connotations as the rest of the PL world. But maybe it's a generic methods thing. I should probably study CLOS more.) Not just function declarations introduce actual functions into a scope, but also imports. (As an aside, it strikes me that importing things in nested blocks as opposed to at the top of a file might create interactions with this feature that are too weird and/or confusing. But I'm not 100% sure either way.)

The whole thing leads to a more "democratic", lexically scoped notion of code ownership. In Java, Smalltalk, JavaScript, and other languages with single-method dispatch, it's pretty clear that method code "belongs" to the declaring class. In Dylan, method code belongs to you, the author (whether you declared the class yourself or just imported it) or to first-party or third-party imports you make.

I might be wrong, but since this is all lexically-scoped, it should also lead to the optimizer being able to "see further" into method calls that would otherwise be too late-bound and opaque.

I like the simplicity of make, the object initializer method. Many languages make a big deal of object construction, sometimes with special syntax (new), sometimes by making the type value callable (like Python). There's always magic somewhere in the middle, what with allocation and low-level initialization... but Dylan makes it look disarmingly normal.

Objects feel more "pure", because they are truly opaque objects in the Dylan world. The job of class declarations is to declare "slots" (private attributes) and their "accessors" (very similar to the dot twigil in Perl 6). Accessors, being methods, are just exposed as lexical names that can do function calls. Dylan provides syntactic sugar for getters and setters, which makes the syntax feel much more palatable and "OO-y". But note that this only goes for accessors, not for methods in general. (I think...?)

Again, I'm not charging into this one because I don't feel I understand the consequences fully. I'm pretty sure I'd need to try things out in a branch to see all the consequences. So what I'm calling for is some kind of investigation. But I am intrigued.

One objection off the bat (or at least something giving the whole idea a negative starting score) is that generic functions are less common and less mainstream than the traditional single-method-dispatch class-owned methods. This drawback has to be offset by enough visible benefit gained from switching.

Another objection is that it ties us into #33 as a non-optional thing. You'd still be able to not care about type declarations on most things, but for those functions that would previously have been methods of your class, you would have to declare the type on the "invocant" parameter, just so that 007 will know that it "belongs" to your class. Basically, them's the breaks — it falls out, waterbed-like, from everything else. I don't know if it's a bad thing, but it's a factor.

masak commented 5 years ago

I should probably study CLOS more.

I'm now reading through The Common Lisp Object System: An Overview; it feels very relevant and well-written.

In the generic function approach, objects and functions are autonomous entities, and neither is a property of the other.

Right! Nice.

A further problem with message-passing systems is that there must be some way within a method of naming the object to which a message was sent. Usually there is a pseudo variable named something like “self.” With generic functions, parameters are named exactly the same way that Common Lisp parameters are named—there is conceptual simplicity in using the fewest constructs in a programming language.

Gosh, yes. 007 used to proudly not have a self; then it got one (and I wasn't sure whether to name it this). If we adopted generic functions instead, self would again go away, and the conceptual cost of self would be deducted from the conceptual weight of generic functions.

masak commented 5 years ago

And yes, on page 11 of that text, it's confirmed that CLOS uses the same terminology as Dylan: methods belong to generic functions, rather than classes.

I'm a bit divided. On the one hand, it'd make sense to conform to CLOS/Dylan nomenclature, and call them methods in 007 as well. On the other hand, I kinda like the func keyword.

Gonna let that one simmer a bit.

masak commented 5 years ago

Definitely need to try this in a branch.

My feeling right now is that generic functions would be a good fit for 007. I do enjoy the idea of making classes be about slots only; that feels sane to me. I'm even fine with letting the whole type system stick pretty close to CLOS and Dylan. Moreover, operators (which already work fine in 007) would definitely thrive under the generic function regime.

My one reservation so far is that it makes functions a slightly more complicated story. There's something delightfully simple about func foo putting a Func value into the current lexical scope. In a world with generic functions, the first time we did func foo, it would put a GenericFunction into the current lexical scope (and a Method somehow registered "inside" it); subsequent func foo declarations in the same scope and in nested scopes would just declare more Methods inside the same GenericFunction.

The visibility gets more complicated also. In the old world, individual functions either are in scope/visible, or they're not. Here, all the methods get registered once and for all, but which ones are visible depends on whether a declaration is visible from the call site. I'm not 100% sure how to implement that.

So yeah, hope and confusion.

masak commented 5 years ago

This post is interesting, and signals that we'll want to be very careful with implicit behavior around candidate matching.

CLOS and Dylan seem to be fine with "do some best-effort ordering, and call the first one on the list". If two things tie, then Last One Defined Wins (?). The blog post seems to indicate that this is not good enough.

Perl 6's MMD is stricter (at least for multisubs without nameds), and requires an unambiguous winner:

$ perl6 -e'multi foo(Int $x, Any $y) {}; multi foo(Any $x, Int $y) {}; foo(0, 0)'
Ambiguous call to 'foo(Int, Int)'; these signatures all match:
:(Int $x, $y)
:($x, Int $y)
  in block <unit> at -e line 1

We should probably do something like that, for starters.

masak commented 5 years ago

Also worth pointing out. If we switch to generic functions, we'll largely back out of the changes made in #111.

masak commented 5 years ago

I started writing something here about methods and closure semantics, but before I got to the end I had contradicted myself. It's... weird. I need to think about it more.

masak commented 5 years ago

Ok. I think the only sane model is for all methods lexically nested under a generic function to be visible/callable via the generic function. Regardless of whether a particular method is lexically visible from the callsite or not.

Methods (re-)register a freshly closed function value on every surrounding-block entry, just like today. It's just that they register against the generic function, not against the lexical pad.

masak commented 5 years ago

Or maybe not...

My falling-asleep thought yesterday was "what about when a generic function is imported from a module?" We can't just destructively update that module's generic function with new methods — it works for us but not for that module. We need a new one, which inherits all the methods from the old one, protypically. (Copying works too, but see below.)

Then I thought, the exact same is true of nested blocks as well. So the nested block gets a new generic function, which inherits from the generic function on the outside. When it goes out of scope, we lose all those candidates.

The reason prototypical inheritance is important, at least in the nested-block case, the outer generic function might still get methods after we parse the block. The inner generic function needs to see those too.

This last point also means we cannot cache narrowness posets until CHECK time. Or we could, but some of them would be wrong and need re-computing.

vendethiel commented 5 years ago

.oO( Many more things turn out to be lexical that what is bare to the eyes )

masak commented 5 years ago

In other words, the generic function of a method is always declared in the same scope. It in turn has prototypical links to same-named generic functions in outer blocks or imported modules.

masak commented 5 years ago

In the fullness of time, these become highly inlineable. Once we've computed the poset cache for a generic function, we can inline it as a sequence of boring if statements with calls to individual methods. These can be further specialized, dead-code-eliminated, and inlined. In some cases, with enough type information, generic functions can be "zero-cost".

masak commented 5 years ago

Here's a mouth-watering implementation of rock-paper-scissors:

import syntax.stmt.repeat;
import syntax.op.conditional;

enum Choice { rock, paper, scissors }

func infix:<beats>(p1, p2) { false }
func infix:<beats>(Choice.rock, Choice.scissors) { true }
func infix:<beats>(Choice.scissors, Choice.paper) { true }
func infix:<beats>(Choice.paper, Choice.rock) { true }

my computerChoice = pick(Choice);
repeat until my humanChoice {
    my humanInput = prompt("Enter your choice: ");
    try humanChoice = asEnum(Choice, humanInput);
}

say("I chose ", computerChoice, ".");

say(humanChoice beats computerChoice
    ?? "You win!"
    !! computerChoice beats humanChoice
        ?? "I win!"
        !! "It's a tie!");

Not much to comment on, except:

masak commented 5 years ago

Oh, and perhaps most importantly:

masak commented 5 years ago

Another thing: I wrote func without much hesitation, but these are now actually called "methods", at least with Dylan/CLOS terminology. I find it hard to decide between keeping func as-is or switching to a method keyword — feels like either choice will surprise some group of people.

masak commented 5 years ago

Heh. Also known as "extension methods for everything". 😛

masak commented 5 years ago

Implicit protos are generated behind the scenes whenever there's a new method in a scope without an explicit proto. The implicit proto is "partially transparent", it contains the appropriate methods from outer blocks and imports, aside from the new methods that made it be generated. It's signature is computed as roughly the union of all its methods.

Explicit protos are declared by the programmer. They are "opaque"; any methods from outer scopes or modules are shadowed, exactly like an inner my shadows an outer one of the same name. The signature is explicitly provided, and doesn't have to be compatible with whatever's outside. (The explicit proto acts like a guarantee that outer scopes can't accidentally inject methods into our dispatch; it guarantees that we alone control the methods/dispatch, and we aren't vulnerable to outside renames which accidentally capture some dispatch. Kind of the lexical version of the fragile base class problem.) The explicit proto has to be declared before any of its methods. The signatures of the methods have to be compatible with the explicit proto. Proto declarations have no body.

There might be some rare situation where we want to explicitly declare a proto, but still get the partial transparency of implicit protos. (For example, in order to explicitly limit the signature of the methods in the block.) An annotation could help achieve that. (@overload?)

masak commented 5 years ago

Define a longname to be the combination of the method's lexical name and its signature (excluding, I think, its return type). Perl 6 makes this distinction too.

Methods with the same longname as outer methods will also lexically shadow those outer methods. I don't think that's a problem as such, and maybe it'll even fall out automatically if we consider the inner methods first. But it's something to be aware of.

If we ever support a nextsame-like mechanism, it'll start to matter in which order we consider shadowed outer methods versus inner-but-wider signatures.

masak commented 1 year ago

I think the most interesting thing I learned since I started this issue comes from the paper Polymorphic Symmetric Multiple Dispatch with Variance, about dispatch in the Fortress language.

Trying to summarize here, to the best of my understanding:

I dunno. I like it. It sacrifices some of the most dynamic aspects of multiple dispatch, but it also pays back handsomely in terms of a static guarantee that we always have a unique method to call at runtime. That is no small thing.