Overloading constructors: who is right?

PaulKlint commented 10 years ago

Consider the overloading of int as constructor in Type and constructor in the syntax rule below. The type checker forbids this, the evaluator allows it. Who is right? @jurgenvinju @tvdstorm @mahills @Anastassija opinions?

[Note there are quite some syntax definitions that depend on this kind of overloading so it would be nice if we can allow it]:

module experiments::Compiler::Examples::Tst

import Type;

lexical Num = \int: [0-9]+;

gives [@mahills : I added a print statement and made the error message more informative]

addProduction: prod(label("int",lex("Num")),[iter(\char-class([range(48,57)]))],{})
|rascal://lang::rascal::types::CheckTypes|(44044,73,<873,58>,<873,131>): "Invalid addition: cannot add production RSimpleName(\"int\") into scope, it clashes with non-constructor variable or function names"

jurgenvinju commented 10 years ago

I believe it should be allowed, since we allow this overloading between two datatypes as well. If ambiguous at the call site, qualified names are required. — Jurgen J. Vinju CWI SWAT INRIA Lille UvA master software engineering http://jurgen.vinju.org

On Sun, Feb 16, 2014 at 9:47 PM, Paul Klint notifications@github.com wrote:

Consider the overloading of int as constructor in Type and constructor in the syntax rule below. The type checker forbids this, the evaluator allows it. Who is right? @jurgenvinju @tvdstorm @mahills @Anastassija opinions? [Note there are quite some syntax definitions that depend on this kind of overloading so it would be nice if we can allow it]:
module experiments::Compiler::Examples::Tst
import Type;
lexical Num = \int: [0-9]+;
gives [@mahills : I added a print statement and made the error message more informative]
addProduction: prod(label("int",lex("Num")),[iter(\char-class([range(48,57)]))],{})
|rascal://lang::rascal::types::CheckTypes|(44044,73,<873,58>,<873,131>): "Invalid addition: cannot add production RSimpleName(\"int\") into scope, it clashes with non-constructor variable or function names"
Reply to this email directly or view it on GitHub: https://github.com/cwi-swat/rascal/issues/502

mahills commented 10 years ago

The main challenge with this is the following: say we have the case given above. If we see a use of \int, we currently can always assign it a type. If we allow this, that will no longer be possible. At the same time, I guess we already have this problem with constructors (if you reuse a name between different types), so maybe this isn't a big deal, we just need the overload to include the various ADT or nonterminal types that could be assigned to that name. It looks like I'll need to make the name introduction rules more forgiving and the name use rules more involved for these types of cases.

mahills commented 10 years ago

I definitely need to address this now, since this pops up all over the place when checking the type checker itself (since we import the Rascal grammar). This is still the same issue, though -- if we just see something like \int(), we need to use the context to determine whether that particular use is a constructor or a production/concrete syntax type so we can assign the proper type to it. In some cases this will be straight-forward (e.g., we are returning it, and we know the return type), while in others it will be quite challenging (we assign it to a variable of inferred type).

PaulKlint commented 10 years ago

I find it hard to asess the pros and cons of the alternatives:

from a user perspective we probably want to allows this, but the price is added complexity in the type checker.
if we forbid this overloading, we restrict the user but simplify the typechecker.

So there are really two questions:

What are the complications in the type checker if we select the first option?
How bad would it be if we select the second option? How inconvenient will this be? Is it consistent?

mahills commented 10 years ago

I think the main issue right now is not with type names, but with constructors for abstract versus concrete types. The behavior right now is not to import a name if this will cause a clash -- so, if there is a constructor named int, and a production named int, both imported, the name isn't brought into scope (although it is still available in its fully qualified form). This is confusing, since you think "but I imported it!", and may not know why it was not imported.

What I can do is keep a set of these around to make better warning messages, which at least would help, and I guess we could think of adding a quick fix later that would allow users to just repair it rapidly. Another option would be to not import production names by default -- I think they are not used as often as constructor names, so this would eliminate many of the clashes unless you really needed the production names to be available.

To address your specific questions:

the complication is that we have to pass around an "undecided" type containing various options, which may never get properly resolved, in which case we would have terms in the program that do not have a defined type. We would also have to add support for "resolving" these in a number of instances, such as when passed to another function (even if we start saying #f instead of just f, as we discussed earlier, this is still an issue), when returned, when applied, etc...
I think this is consistent, but it is also inconvenient, especially in programs like the checker, custom imploders, etc that make use of concrete syntax definitions which may clash on certain constructors but not on others. The main problem here is that the current interpreter allows this, since it can just resolve it dynamically in most cases.

Another option would be to not check on names, but on the entire signature, so you could have an int with arity 0 and an int with arity 1. Assuming we don't add varargs constructors, this would help, but would still be an issue if we just use the name without giving arguments (and still wouldn't help in this specific case, since the version of int defined as a Symbol and the version of int defined as a BasicType both have arity 0).

PaulKlint commented 10 years ago

Great analysis! Maybe we should first try to answer the question: who needs these production names anyway?

The only answers I can come up with are (but maybe I am missing something?):

implode
meta programs that manipulate Rascal code.

I guess that for implode the production names need not be visible in the importing module since it gets them via the grammar (in the interpreter from the module environment in the compiler via the reified type).

For meta-programs the same holds. _This would imply that we don't need to import the production names at all._ Can anybody confirm/refute this reasoning?

DavyLandman commented 10 years ago

Are you talking specifically for the Rascal grammar's production names?

Else, I think production names are usefull: if you want to match on a parse tree without using concrete syntax. Or if you would want to use a non terminal from another grammar and use the Statement!jump!goto feature to remove some productions from it?

jurgenvinju commented 10 years ago

@PaulKlint we do need these names to avoid writing complex concrete patterns when a simple test will do as in myStat is if and if(_,_,_) := myStat as opposed to (Statement) if { <Expression e> } <Statement* _> else { <Statement* _> } := myStat. I don't think Rascal meta programs are a special case in that sense. We needed them first for Rascal meta programs to get ourselves out of a nasty bootstrap situation, but they are needed in all programs manipulating syntax trees.

Two solutions to this problem:

solving: have a different notation for constructor names from parse trees: If(_,_) instead of if. We could let parse tree constructor names start with capital letters and abstract constructor names with lowercase. Problem solved for the constructor names, which leaves a solution wanting for the type names; something like syntax[Expression] instead of Expression could be used to disambiguate type names. We might make this optional.
mitigating: do not have a mirror AST definition for a grammar, instead have full pattern matching and construction capabilities with abstract syntax on concrete trees (already a feature request for a long time)

I am for having both solutions.

jurgenvinju commented 10 years ago

@DavyLandman that Statement!jump notation is definitely also undoable without production names indeed.

PaulKlint commented 10 years ago

Ok indeed use cases I missed, thanks for mentioning them. The next question is then: do we really need to bring them in scope? Could a library function getProductionNames(Statement) be a solution?

PaulKlint commented 10 years ago

@jurgenvinju case distinction is a solution but maybe it is a bit too subtle?

Indeed we should aim for completely transparent mixing of concrete and abstract matching.

jurgenvinju commented 10 years ago

No the library function solution looks ugly to me.

Listen we have this same issue for a much simpler case, namely two functions in two modules who happen to have the same name, same amount of arguments, but different return type:

module A
  int f();
module B
  Exp f();
module C
import A;
import B;

void main() {
  println(f());
}

There we ask to disambiguate the function by qualifying it. Why can't we do this for this case as well? At the moment we use f anywhere, the type checker can either resolve it uniquely or not, and if not the programmer should disambiguate. Is a similar thing not doable for constructor/production names?

Sorry for asking stupid questions; its pretty hard to oversee all the implications.

jurgenvinju commented 10 years ago

@PaulKlint instead of case distinction pick an operator? ^if(_,_) := myStat, myStat is ^if

The case distinction trick is less space and "big letters = big trees, small letters is small trees"

mahills commented 10 years ago

Hi @jurgenvinju, qualification works fine. The main issue relates more to volume that capability -- with pattern-based dispatch we could have a large number of cases where type names need to be qualified, where this is less likely with function names (unless you just give every function the same name) or type names (which do not appear as often, since they can be left out in most cases other than top-level variables, aliases, function signatures, etc). Also, with production and constructor names, there are a lot more of them, so it is more likely that one will have an inadvertent clash between them.

PaulKlint commented 10 years ago

If I remember correctly, we do not syntactically allow qualified names in patterns.

mahills commented 10 years ago

From Slack:

Thinking on this more, it may be possible to add them both in at the same time. We do this now with functions, and we just say you need to qualify them if there is an ambiguity. So, in theory, we really should be able to do the same thing with constructors and productions.

mahills commented 10 years ago

@PaulKlint at least based on the pattern syntax, I think we can use them except as type names inside concrete syntax patterns, where the hole just contains a standard Name instead of a Type or QualifiedName.

PaulKlint commented 10 years ago

Ok I will turn this into a new issue.

jurgenvinju commented 10 years ago

I believe we've arrived to conclude that this issue is the same as #493 (see discussion in #638 ).

jurgenvinju commented 10 years ago

So @mahills, if I understand right, right now in case of conflict none of the names are brought in scope, which may result in lots of error messages? Could we still bring them in scope, but explicitly labeled ambiguous in order to make sure we can give good error messages (this is what you suggested in your comment from Slack). Sounds like a great solution and also more consistent with the way we deal with function names.

Even though in the future we may syntactically separate data and syntax types and constructors, it may still happen due to importing modules that we import ambiguously defined constructors, so a solution here will be necessary regardless.

mahills commented 10 years ago

That is what I will try to do -- earlier we had tentatively decided to just not import them, but still allow them to be accessed with their fully qualified names. I think it should be possible to import them anyway, but will need to put in more code to check for ambiguities. I still think we should disallow declarations of constructors and productions (and of abstract and concrete types) with the same name within the same module, though -- that gives no way to actually disambiguate them, since the fully qualified name could then refer to either.

PaulKlint commented 10 years ago

@mahills this essentially consolidates our current practice: put concrete and abstract syntax in separate modules and this is fine.

However, precisely for educational purposes it is heavy weight and there I would prefer a solution where both can live together in a single module. But let's postpone this until we separate these name spaces in some way.

jurgenvinju commented 10 years ago

Agreed, within the same module we do not allow ambiguous overloading, not for functions, not for constructors and not for syntax. The only exceptions being default functions can overload non-default functions and constructors or productions may overload functions.

mahills commented 10 years ago

@jurgenvinju agreed for constructors and syntax -- for functions it is a more complicated story, since we naturally allow a large number of overlapping overloads because of patterns in the signature. This makes it quite challenging to detect an actual overlap. Long term, I need to add pattern coverage checking to see if there are missing or overlapping patterns.

@PaulKlint agreed that it would be nice to have everything in one module, and I also agree that we should wait until the namespaces are separate to do this.

PaulKlint commented 9 years ago

My impression is that this issue is solved.

mahills commented 9 years ago

I believe so as well -- in general, it will look for a concrete version of the type in cases where a) no unqualified type of that name is in scope, and b) it can tell that we attempted to load the name but it would have caused a collision between concrete and abstract type versions. This should prevent "leaking" concrete type names, since we always keep them around in the background so the compiler can use them.

usethesource / rascal

Overloading constructors: who is right? #502