Closed kjx closed 1 year ago
I don't understand what you mean when you write
Requests to methods that don't appear lexically but are defined in superclasses resolve though self; requests to methods that don't appear lexically but are defined in subclasses resolve to outer. It's hard to see how to fix this unless every method resolution is re-computed on every subclass...
Specifically, what is the problem that you want to "fix"? My interpretation of "ambiguous" is "lexically ambiguous in the current compilation unit". I don't see anything ambiguous about your example. When module mid is compiled, mAmbig
has exactly one definition, so the request of mAmbig
is interpreted to mean outer.mAmbig
. It couldn't possibly mean anything else. If the programmer wanted class mid
to be an abstract superclass, then they could have declared a
method mAmbig { required }
in class mid
, and then they should get an ambiguity warning, which they can get rid of by writing outer.mAmbig
or self.mAmbig
, depending what they meant. Obviously, self.mAmbig
can be overridden by sub-objects of mid
, whereas (in this example), outer.mAmbig
can't be overridden, since it's defined in a module, and we don't allow inheritance from modules.
I agree with Andrew. When a method mmid is defined, the calls to mTop and mAmbig are resolved to self.mTop & outer.mAmbig and rhode expanded expressions are what is inherited. When mMid is overridden the new defy both use self -- unambiguously.
1 & 2. there's a difference between permitting manifest checks as an optimisation or option for e.g. static checking, and requiring manifestness as part of the semantics. As I read the current spec, the reason it requires manifestness are to give these ambiguity errors and to check override annotations.
Where this is odd, is the interaction between our hopes to talk about traits/Eiffel style flattening to describe inheritance, and the idea that an object is just a single-level collection of methods, without e.g. the separate "part-objects" that you need if you are doing super-style inheritance. If you think about flattening a slightly different bot2,
//module Bot
import "mid" as m
class bot2 {
inherit m.mid
method mAmbig { ... }
}
you can get something like this:
class flatBot2 {
method mTop { ... }
method mMid {
mTop // resolves through self
mAmbig // resolves through outer to module m.mid --- even though at runtime self may be an instance of bot
}
method mAmbig { ... }
}
so how can we explain that to our nominal novices? What we have to do is explain that you have to resolve implicit requests before you flatten, so the flattened version is something like this:
class flatBot2 {
method mTop { ... }
method mMid {
self.mTop // resolves through self
outer.mAmbig // resolves through outer to module m.mid --- even though at runtime self may be an instance of bot
}
method mAmbig { ... }
}
Now I know that's not quite right, because the method objects etc are in different lexical scopes; and you can't really flatten in general because of that --- but in fact the modules and difference scopes in this examples is a macguffin --- the same problem arises if all the classes are in exactly the same lexical scope. Which is a bit ugly if we want to talk about flattening in any sense...
I should add: going "out" then "up" doesn't have this problem, as in Newspeak, because your lexical scope is always manifest. If you can see something going out, you make it an outer request, otherwise you leave it as a self request and hope.
Note also this transformation gets nastier if you have to write the right number of "outer"s, rather than just having an outer that works like "super" used to: you only need one, and it just disambiguates the lookup direction.
Flattening doesn't make any sense until after all of the outers and self's have been replaced by bound references. If you move a line of source code that says outer.foo
from one place to another, the outer is obviously going to bind to something else. The same holds true for implicit outers. We have to do alpha-conversion first.
So what I'm trying to do is find a way of describing all this to novices:
x
into either self.x
or outer.x
(or error I guess)
outer.outer.outer.x
is of course super-ugly The simplest thing would be to ditch outer and allow every object to have a name. Then one could use the name.
object wombat {
method eyeColor ...
}
That syntax doesn't work for classes and modules, though.
One can write
object {
def wombat = self
method eyeColor ...
}
but that doesn't help, because to get to the name wombat
we have to write outer.wombat
, and getting rid of that was our goal.
def wombat = object wombat {
method eyeColor ...
}
wombat
doesn't work either, at least, not when we require the object to be fresh, as in a class.
Our rule, which says that a receiverless message
means either outer. ... .outer.message
or self.message
, whichever makes sense, seems like something that is easy to explain to novices.
If they both make sense, the the programmer has to say which they mean. A rule that attempts to pick one over the other when they are both defined would become Grace's equivalent of the "sender-path-tiebreaker-rule".
The simplest thing would be to ditch outer and allow every object to have a name. Then one could use the name.
OCAML does that.
For objects: object (nameOfSelf) ... end;;
For classes: class name = object (nameOfSelf) ... end;
Scala too, I think it's object { nameOfSelf => ...}
That syntax doesn't work for classes and modules, though.
Could have a module
keyword to get the module object. In Newspeak you write outer C
where C
is the name of an enclosing class (and you don't have enclosing objects as such in Newspeak).
def wombat = object wombat {
this is of course why I don't like that idea.
not when we require the object to be fresh, as in a class.
depends what we mean by fresh. If fresh actually means "is the direct result of a call returning an object constructor" then we're fine.
A rule that attempts to pick one over the other when they are both defined would become Grace's equivalent of the "sender-path-tiebreaker-rule".
I don't see why. Newspeak, Scala, GBETA and Java have such a rule and it doesn't have that effect. The advantage of the Newspeak Out then Up rule (lexical binding takes precedence; lexical binding to self is a self send; lexical binding does not consider inheritance at all) is that the question of how things get resolved is purely lexical - you don't have to look any further out the module...
@kjx, I found that this was still open while searching for something else. Do you want to close it? Or do you want to advocate for switching to out before up?
I'm happy with the current implementation, which requires the programmer to resolve the conflict.
It's not about the implementation, it's about the spec. (although I can see why wiring that restriction into the spec makes statically compiled implementations easier.)
I still stand by what I've written here. It's not the worst part of the current inheritance design (that's aliased defs and vars).
I think a dialect could enforce the ambiguity error assuming the ancestor traits are manifest, so in that sense this is related to issues of static vs dynamic structure checks (e.g. when do self requests to missing methods (methods that should be required abstract) cause errors: in the type checker, or in the "base" language.
I think it's a pity if we have 90% of an inheritance story ("flattening in the same lexical context") that is then broken by this detail and a couple of others.
Ditching "outer" and naming objects doesn't solve the problem about resolving implicit requests at all.
To answer @apblack's question back in Feb, I fear this will become part of my "broadening campaign" (as andrew put it somewhere). If we really believe in the unambiguous definition rule, then every implicit request will be unambiguous, and so we should never need to resolve them: thus we could do away with self and outer requests as special kinds of requests ( #140).
Note that this is orthogonal to whether a superclass is manifest or not: if a superclass isn't manifest we could require the ambiguity check at "runtime". I think that the ability to declare method as self as required/abstract helps manage things here, pruning some cases.
I think the only remaining problem is a subtle version of "fragile base classes" - a new version of some parent class could declare a method that we implicit request (expect that implicit request to be resolved to some lexically enclosing declaration), making the subclass request ambiguous. Out then Up avoids this problem, because lexical requests are always resolved lexically.
In the spec, we say:
When interpreting an implicit request of a method named m, the usual rules of lexical scoping apply, so a definition of m in the current scope will take precedence over any definitions in enclosing scopes. However, if m is defined in the current scope by inheritance or trait use, rather than directly, and also defined directly in an enclosing scope, then an implicit request of m is ambiguous, and is an error.
When we say "directly in an enclosing scope", did we mean to include dialects?
If we did, then it matters whether a definition appears directly in a dialect, or is reused by the dialect. This means, for example, that a refactoring of the standard dialect to move everything into a trait to enable dialect combination, like the one that I performed in the fall, would cause an ambiguity to disappear (not a big problem!), but that the inverse refactoring (moving from a trait to a direct definition) will create an ambiguity when there didn't use to be one. This seems to me to be Bad.
Practically, it also means that the external representation of a compiled dialect has to distinguish between reused and direct definitions, just because of the disambiguation rule.
Hence, I'm inclined to think that when we wrote "directly in an enclosing scope", we meant "visibly in the text that you can see surrounding the request that you are trying to disambiguate", and did not intend to include the dialect at all. By this interpretation, a reused definition will always override a definition from a dialect, independent of whether the dialect declares the conflicting name directly, or through reuse.
I'm raising this issue now because my new symbol table and identifier resolution mechanism — which implements the above clause, and interprets directly to include the dialect, but does not distinguish between reused and direct definitions — is detecting conflicts that minigrace never used to detect. I need to know which fix to deploy:
self
to the ambiguous requests.Here's a version that uses outer objects instead of dialects. I think that dialects should behave the same way as this example does.
def fakeDialectDirect = object {
method ambiguous { print "defined in fakeDialectDirect" }
def objectDirect is public = object {
inherit definesAmbiguous "objectDirect"
method test { ambiguous }
}
}
def fakeDialectIndirect = object {
inherit definesAmbiguous "fakeDialectIndirect"
def objectIndirect is public = object {
inherit definesAmbiguous "objectIndirect"
method test { ambiguous }
}
}
class definesAmbiguous(s) {
method ambiguous { print "inherited by {s}" }
}
fakeDialectDirect.objectDirect.test
fakeDialectIndirect.objectIndirect.test
inherited by objectDirect inherited by objectIndirect
defined in fakeDialectDirect inherited by objectIndirect
inherited by objectIndirect
Well we've at least agreed (it seems) what the second case should do, if it runs, so that's something. I don't think minigrace's behaviour here is consistent. If we can agree on this case, I think dialects should do the same thing.
James' example is very helpful and I agree that it makes sense to have dialects work like explicit containing scopes.
My philosophy on Grace has always been that ambiguities should be errors. If a method is defined directly in the current scope, that should be what is chosen -- it is hard to argue that either an enclosing scope or inherited method would be chosen instead. However, if a method is nod defined in the current scope but can be gotten by inheritance or "outer"s then we have an ambiguity, so it should be flagged as an error. I believe this is consistent with the "flattening" or "copy down" interpretation of inheritance and outer that we have talked about in the past.
Thus I believe both examples should give an "ambiguous" definition error, easily resolved by adding an outer or self prefix.
Let me know if I've missed something here.
Thanks, @kjx, for including the examples: they help to sharpen the discussion.
I don't see why fakeDialectIndirect
should be treated as ambiguous. In this example, there are two definitions for method ambiguous
, both inherited. Clearly, the one in self
should have priority over the one in outer
— this is the normal meaning of lexical scope. I can't imagine a universe in which the outer definition would have priority over the inner one, and I can't understand why Kernan or Moth would want to run two methods in response to a single request.
In contrast, in the mis-named fakeDialectDirect
, the method ambiguous
is still inherited in self
, but is lexical in outer
. In this case — but not in the case of fakeDialectIndirect
— one can ask: "up or out"? Which definition should have priority, the inherited definition ("up") or the lexical definition ("out")? For once, minigrace does what the Grace language specification says. Kernan and Moth do not.
I disagree this example sheds any light on what should happen in the case of a dialect. The point of a dialect is that it is a package of definitions that should be treated as a black box. The dialectical program should not care about the internal structure of the dialect, only about the methods that the dialect makes available.
This is obviously different from the case shown in the example, when definitions appear in the same module as the code that uses them. In this case, programmers can see the structure of the definitions—in particular, they can see whether a name is inherited or defined directly.
I don't recall why we decided that dialects should be treated as enclosing scopes, rather than inherited scopes. The 2014 ECOOP paper on dialects says that they are so treated, but does not say why.
@KimBruce asks what he is missing. What I think you are both missing is the point that restructuring the internals of a dialect should not affect the legality of a program that uses that dialect — provided that the definitions that the dialect makes available are not changed.
In this case, programmers can see the structure of the definitions—in particular, they can see whether a name is inherited or defined directly.
yes in this case, but not (necessarily) the definitions are in another module. If think we can distinguish between inherited and lexical definitions, or/and between dialects and other things, but I'm not sure crossing module boundaries in general is a distinction I'm happy with.
I don't recall why we decided that dialects should be treated as enclosing scopes, rather than inherited scopes.
I can't remember - you could ask the first author of the paper, he'd probably know? Apart from doing the Right Thing™, I think the argument may have had something to do with not having C++/Java style lexical-declaration-only-private attributes. Except on reflection now, that doesn't make sense. One can override inherited attributes but we didn't want to allow overriding dialects?
What I think you are both missing is the point that restructuring the internals of a dialect should not affect the legality of a program that uses that dialect — provided that the definitions that the dialect makes available are not changed.
right, but how is that different from any other object?
I can't understand why Kernan or Moth would want to run two methods in response to a single request.
they don't: I mean they run both of two test cases with the results given in order for each case.
After looking at local definitions, it seems:
I've NO IDEA if my regex-like notation is correct. One could code it up and see.
What I think you are both missing is the point that restructuring the internals of a dialect should not affect the legality of a program that uses that dialect — provided that the definitions that the dialect makes available are not changed.
right, but how is that different from any other object?
It's completely different from other objects.
When I give you an object to use, or to reuse, you have access to all of its methods. It does not matter whether they are defined directly in that object or obtained by reusing another object. Consequently, I can do an "add superclass" refactoring without any of the clients needing to know.
If we apply the ambiguity rule to dialects, then it matters whether the dialect defines something directly, or by reusing a "bundle": a direct definition causes an ambiguity, while a reused one does not.
I don't understand James's regular expression notation, or the idea of "looking at all inherited definitions at each lexical level" — surely we only ever interested in the first inherited definition, and the first lexically enclosing definition. The working in the Grace spec seems entirely clear, and I have no interest in changing our resolution rule. I'm just interested in clarifying whether or not we intend them to apply uniformly to dialects.
If everyone follows @kjx's "two-part dialect convention", which is to put all of the definitions into a trait in one module, and then defining the dialect to do nothing but reuse that trait, then it doesn't matter a jot. That's because nothing will be defined directly in the dialect, so the dialect will never be home to an ambiguating definition.
But the language lawyer in me would like to clarify the language in the Spec.
This morning I sat down to actually implement what @kjx seems to want, which is that names defined by a dialect through reuse don't cause ambiguity, whereas names defined directly in a dialect do cause an ambiguity. (At least, I think that's what @kjx wants.)
After half an hour in the debugger trying to see why it didn't work, I hit myself on the head and realized: it can't possibly work.
Why not? Because, not only don't programmers know which parts of a dialect are inherited and which are local to the dialect module, the compiler also does not know. The dialect module is separately compiled. It's a black box that provides definitions. The compiler of a module that uses a dialect has no idea how that dialect was constructed.
Of course, with enough effort I could expose that information as part of the compiled dialect module. But I don't plan to spend that effort on something that I believe is fundamentally wrong. Modules (unlike classes) are intended to be black boxes, and breeching that encapsulation boundary just to make more programs illegal is not worth the candle.
Sometimes code talks to you; it's trying to tell you something. This was one of those times. Treating a dialect's inherited and direct definitions identically required me to delete 15 lines of code from the compiler — and make no other changes except to the comments.
It's the right thing to do.
First let's discuss how dialects come into scope. I claim dialects should be interpreted as enclosing scopes (as in the paper) because blind inheritance is dangerous. Suppose the dialect includes a method m, that is used in many other places in the dialect. Now imagine a program using the dialect redefines method m. If the dialect is treated as though it is inherited in the program, then the new definition will override the one in the dialect and all uses of m elsewhere in the dialect will invoke the new definition, likely screwing up lots of code that the programmer does not have available (it seems unlikely our programmers would be reading much code from dialects!).
On the other hand, if the dialect is considered an enclosing scope, the redefinition has no impact on the requests of the original in the rest of the dialect. Moreover, the original definition is still available via outer.m. This is yet another example of the rule that you shouldn't use inheritance unless you completely understand all the superclasses you are inheriting from.
I agree with Andrew's post from above, that the dialect is a black box, and you shouldn't (and can't) know whether a definition/method arose from writing it directly or inheriting it into the dialect. Thus you should treat both in the same way. As I stated before, I believe the correct answer is to flag the possible ambiguity in both cases and make the programmer specify which is intended when you have a definition with the same name (signature) being brought in via both inheritance and from an outer scope (no matter how it got into that outer scope!). It's a simple error message:
"The request of method m on line xxx is ambigous as m is inherited from yyy as well as being defined in an outer scope/dialect. If you wish to request the inherited m, write self.m. If you wish the version from the outer scope/dialect then write outer.m."
We spent a long time discussing why the newspeak model of method look up (and similar alternatives) was wrong because of possible ambiguities. It is much simpler to just flag them and let the programmer say what they mean.
(At least, I think that's what @kjx wants.)
No, its not; I'm not even sure I have a preference. I was describing what NewSpeak does: NS's rationale is that lexical scope is truly lexical. Attributes inherited into enclosing scopes aren't lexical; they can't be called by implicit requests, so they don't cause ambiguity because they can't be called.
the compiler also does not know.
that's an implementation issue. Kernan & Moth & whatever just don't have this problem.
Treating a dialect's inherited and direct definitions identically required me to delete 15 lines of code from the compiler
because those semantics align with the design assumptions already underlying minigrace. My self-interp has to handle this explicitly, doing both an inheritance lookup and a lexical lookup - where lookupDeclaration
recurses via lookupEnclosingDeclaration
so that both inherited and immediate declarations are found in each enclosing lexical scope:
this handles the different "directions" explicitly, but it treat things inherited into all enclosing scopes (dialects, modules, nested scopes) in the same way. The only scope treated specially is the one actually containing the "current" code.
For what it's worth, these semantics seem to make sense to me, and I think they're also what Kim is suggesting above. The only catch is after staring at it, I'm not sure whether that code actually does that in all the corner cases...
the compiler also does not know. that's an implementation issue.
Proximally, you are of course correct. But I think that it's also a philosophical issue. What should a compiler know about a separately-compiled module? Many interpreters have this problem too, because they respect module boundaries (ghci, for example, can import compiled modules as well as interpreting them).
The SmallGrace scanner and parser did treat external modules identically to internal ones, because it just parsed everything from scratch and kept the parse trees and symbol tables in memory. The question that I'm trying to address is whether this is philosphically the "right thing". I believe that @kim and @apblack are agreed that dialects out to be black boxes, not white boxes.
Treating a dialect's inherited and direct definitions identically required me to delete 15 lines of code from the compiler
because those semantics align with the design assumptions already underlying minigrace.
No, not really; this discussion is happening because of a complete re-design of this part of name resolution, which actually happened while I was on sabatical at INRIA working on SmallGrace. Dialects have to be treated specially because they have special lexical scope rules — private attributes of a dialect are inaccessible, while those of other scopes are not. So resolution in a dialect cannot just use the same code that is used for resolution in other scopes. And imported modules play no role, because they are accessed by nickname, and never implicitly.
Just like your interpreter, SmallGrace (and now minigrace) handles name resolution explicitly. First it does a local lookup — it one exists, a local definition always wins. Then it does both an inheritance lookup, and a lexical lookup. If they both find a definition, then there is an ambiguity.
The reason that I opened this discussion again is that, because I had implemented what @KimBruce is advocating, I was getting a lot of new conflicts. For example, almost every request of isMe
or asDebugString
now needs to be qualified, because every dialect defines these methods (either locally, or by inheritance from graceObject
), and many, even most, user-defined objects inherit them.
That is why my inclination is to say that names defined in a dialect don't cause a conflict.
Nice clean code! But how can it ever be that inheritanceResult == lexicalResult
?
If you are interested, the new minigrace code is in class variableResolver
. Not as elegant as yours, but in addition to simply locating the right definition, minigrace needs to remember where it was located, so that it can generate the correct code.
Andrew wrote:
The reason that I opened this discussion again is that, because I had implemented what @KimBruce is advocating, I was getting a lot of new conflicts. For example, almost every request of isMe or asDebugString now needs to be qualified, because every dialect defines these methods (either locally, or by inheritance from graceObject), and many, even most, user-defined objects inherit them.
That is why my inclination is to say that names defined in a dialect don't cause a conflict.
Ah, I see the problem. A couple of possible solutions / responses.
1) Why should a dialect need either isMe or asDebugString? Maybe dialects should normally start without those methods. A disadvantage is that this means we have two “top” classes, but I vaguely recall “done’ having a smaller set of methods (though maybe that has been changed). On the other hand, we don’t expect novices to ever define a dialect, so they don’t need to be told about it. We don’t say much to students about the mechanisms behind dialects as it is — they just bring some definitions in scope and run a checker. If conflicts other than isMe or asDebugString occur with dialects, students should be informed and told to choose one via self. or outer.
2) We could not add a special super class, but instead just make sure there are good error messages in these cases, telling the programmer that in most cases they want to use “self”, but warning there are definitions both in the dialect and in their code.
Andrew, can you give some indication of how ofter these errors occur in user code (as opposed to libraries you provide students).
Kim
If we have to do something "special", I would much rather give dialect scopes special rules for ambiguous definitions. Dialects already have special rules for visibility, and I think that we have also concluded that they need a special rule to say that reused and locally-defined attributes are equivalent from their clients' point of view.
Doing something special for asDebugString
won't solve the general problem.
Andrew, can you give some indication of how ofter these errors occur in user code (as opposed to libraries you provide students).
Ah, there's the rub. We don't have any users right now. There are a small handful of ambiguity errors in the compiler. I think that perhaps I should put those 15 lines back in, and see how many show up over time.
Incidentally, the new symbol table and name resolution code is now sufficiently-far forward that I have pushed it to the master branch. In other words, it runs all the tests (except for the one that tested the old gct format). Next is the task of cleaning up the parse tree classes, and making the symbol table information more complete.
I'm closing this issue because (having read it all again) I don't see any reason to change what we have had for the last 7 years, or any likelihood that we will all agree on a change.
I'm creating a new issue for the question that I raised on 18 June 2020 about whether "directly in an outer scope" was intended to include dialects scopes.
The question is: how to resolve implicit requests: lexically scoped (out, outer) vs inheritance ("up", now self).
Grace 0.7 changed from some rather complicated wording which more-or-less gave Newspeak "out then up" semantics to wording saying that ambiguous requests is an error (in the core language, not even in a dialect).
One consequence of this change is that superclasses absolutely have to be manifest - even if an interpreter or VM doesn't really require it. That's because you have to be able to check all the parents (at least) of any self to work out which way to resolve a request. So manifestness is now wired into the semantics.
A second consequence is, well, somewhat odd behaviour given our "flattened objects" story about Grace inheritance these days. The actual self object will have all the methods on it: those defined in its superclass, and also those defined in any potential subclasses (in the logical future, obviously). Requests to methods that don't appear lexically but are defined in superclasses resolve though self; requests to methods that don't appear lexically but are defined in subclasses resolve to outer. It's hard to see how to fix this unless every method resolution is re-computed on every subclass...
Newspeak up then out solves this problem: in class mid, mAmbig always resolves to outer, even if there is a definition coming in from a subclass, or found in a superclass. Methods only resolve on self (if there is a choice) when there is a method definition that really, truly, actually lexically present in the object or class or trait definition to which self is bound.
Notes: