Closed PaulKlint closed 9 years ago
I believe it should be allowed, since we allow this overloading between two datatypes as well. If ambiguous at the call site, qualified names are required. — Jurgen J. Vinju CWI SWAT INRIA Lille UvA master software engineering http://jurgen.vinju.org
On Sun, Feb 16, 2014 at 9:47 PM, Paul Klint notifications@github.com wrote:
Consider the overloading of
int
as constructor inType
and constructor in the syntax rule below. The type checker forbids this, the evaluator allows it. Who is right? @jurgenvinju @tvdstorm @mahills @Anastassija opinions? [Note there are quite some syntax definitions that depend on this kind of overloading so it would be nice if we can allow it]:module experiments::Compiler::Examples::Tst import Type; lexical Num = \int: [0-9]+;
gives [@mahills : I added a print statement and made the error message more informative]
addProduction: prod(label("int",lex("Num")),[iter(\char-class([range(48,57)]))],{}) |rascal://lang::rascal::types::CheckTypes|(44044,73,<873,58>,<873,131>): "Invalid addition: cannot add production RSimpleName(\"int\") into scope, it clashes with non-constructor variable or function names"
Reply to this email directly or view it on GitHub: https://github.com/cwi-swat/rascal/issues/502
The main challenge with this is the following: say we have the case given above. If we see a use of \int
, we currently can always assign it a type. If we allow this, that will no longer be possible. At the same time, I guess we already have this problem with constructors (if you reuse a name between different types), so maybe this isn't a big deal, we just need the overload to include the various ADT or nonterminal types that could be assigned to that name. It looks like I'll need to make the name introduction rules more forgiving and the name use rules more involved for these types of cases.
I definitely need to address this now, since this pops up all over the place when checking the type checker itself (since we import the Rascal grammar). This is still the same issue, though -- if we just see something like \int()
, we need to use the context to determine whether that particular use is a constructor or a production/concrete syntax type so we can assign the proper type to it. In some cases this will be straight-forward (e.g., we are returning it, and we know the return type), while in others it will be quite challenging (we assign it to a variable of inferred type).
I find it hard to asess the pros and cons of the alternatives:
So there are really two questions:
I think the main issue right now is not with type names, but with constructors for abstract versus concrete types. The behavior right now is not to import a name if this will cause a clash -- so, if there is a constructor named int
, and a production named int
, both imported, the name isn't brought into scope (although it is still available in its fully qualified form). This is confusing, since you think "but I imported it!", and may not know why it was not imported.
What I can do is keep a set of these around to make better warning messages, which at least would help, and I guess we could think of adding a quick fix later that would allow users to just repair it rapidly. Another option would be to not import production names by default -- I think they are not used as often as constructor names, so this would eliminate many of the clashes unless you really needed the production names to be available.
To address your specific questions:
#f
instead of just f
, as we discussed earlier, this is still an issue), when returned, when applied, etc...Another option would be to not check on names, but on the entire signature, so you could have an int
with arity 0 and an int
with arity 1. Assuming we don't add varargs constructors, this would help, but would still be an issue if we just use the name without giving arguments (and still wouldn't help in this specific case, since the version of int
defined as a Symbol
and the version of int
defined as a BasicType
both have arity 0).
Great analysis! Maybe we should first try to answer the question: who needs these production names anyway?
The only answers I can come up with are (but maybe I am missing something?):
implode
I guess that for implode
the production names need not be visible in the importing module since it gets them via the grammar (in the interpreter from the module environment in the compiler via the reified type).
For meta-programs the same holds. _This would imply that we don't need to import the production names at all._ Can anybody confirm/refute this reasoning?
Are you talking specifically for the Rascal grammar's production names?
Else, I think production names are usefull: if you want to match on a parse tree without using concrete syntax. Or if you would want to use a non terminal from another grammar and use the Statement!jump!goto
feature to remove some productions from it?
@PaulKlint we do need these names to avoid writing complex concrete patterns when a simple test will do as in myStat is if
and if(_,_,_) := myStat
as opposed to (Statement) if { <Expression e> } <Statement* _> else { <Statement* _> } := myStat
. I don't think Rascal meta programs are a special case in that sense. We needed them first for Rascal meta programs to get ourselves out of a nasty bootstrap situation, but they are needed in all programs manipulating syntax trees.
Two solutions to this problem:
If(_,_)
instead of if
. We could let parse tree constructor names start with capital letters and abstract constructor names with lowercase. Problem solved for the constructor names, which leaves a solution wanting for the type names; something like syntax[Expression]
instead of Expression
could be used to disambiguate type names. We might make this optional.I am for having both solutions.
@DavyLandman that Statement!jump
notation is definitely also undoable without production names indeed.
Ok indeed use cases I missed, thanks for mentioning them. The next question is then: do we really need to bring them in scope? Could a library function getProductionNames(Statement)
be a solution?
@jurgenvinju case distinction is a solution but maybe it is a bit too subtle?
Indeed we should aim for completely transparent mixing of concrete and abstract matching.
No the library function solution looks ugly to me.
Listen we have this same issue for a much simpler case, namely two functions in two modules who happen to have the same name, same amount of arguments, but different return type:
module A
int f();
module B
Exp f();
module C
import A;
import B;
void main() {
println(f());
}
There we ask to disambiguate the function by qualifying it. Why can't we do this for this case as well? At the moment we use f
anywhere, the type checker can either resolve it uniquely or not, and if not the programmer should disambiguate. Is a similar thing not doable for constructor/production names?
Sorry for asking stupid questions; its pretty hard to oversee all the implications.
@PaulKlint instead of case distinction pick an operator? ^if(_,_) := myStat
, myStat is ^if
The case distinction trick is less space and "big letters = big trees, small letters is small trees"
Hi @jurgenvinju, qualification works fine. The main issue relates more to volume that capability -- with pattern-based dispatch we could have a large number of cases where type names need to be qualified, where this is less likely with function names (unless you just give every function the same name) or type names (which do not appear as often, since they can be left out in most cases other than top-level variables, aliases, function signatures, etc). Also, with production and constructor names, there are a lot more of them, so it is more likely that one will have an inadvertent clash between them.
If I remember correctly, we do not syntactically allow qualified names in patterns.
From Slack:
Thinking on this more, it may be possible to add them both in at the same time. We do this now with functions, and we just say you need to qualify them if there is an ambiguity. So, in theory, we really should be able to do the same thing with constructors and productions.
@PaulKlint at least based on the pattern syntax, I think we can use them except as type names inside concrete syntax patterns, where the hole just contains a standard Name
instead of a Type
or QualifiedName
.
Ok I will turn this into a new issue.
I believe we've arrived to conclude that this issue is the same as #493 (see discussion in #638 ).
So @mahills, if I understand right, right now in case of conflict none of the names are brought in scope, which may result in lots of error messages? Could we still bring them in scope, but explicitly labeled ambiguous in order to make sure we can give good error messages (this is what you suggested in your comment from Slack). Sounds like a great solution and also more consistent with the way we deal with function names.
Even though in the future we may syntactically separate data and syntax types and constructors, it may still happen due to importing modules that we import ambiguously defined constructors, so a solution here will be necessary regardless.
That is what I will try to do -- earlier we had tentatively decided to just not import them, but still allow them to be accessed with their fully qualified names. I think it should be possible to import them anyway, but will need to put in more code to check for ambiguities. I still think we should disallow declarations of constructors and productions (and of abstract and concrete types) with the same name within the same module, though -- that gives no way to actually disambiguate them, since the fully qualified name could then refer to either.
@mahills this essentially consolidates our current practice: put concrete and abstract syntax in separate modules and this is fine.
However, precisely for educational purposes it is heavy weight and there I would prefer a solution where both can live together in a single module. But let's postpone this until we separate these name spaces in some way.
Agreed, within the same module we do not allow ambiguous overloading, not for functions, not for constructors and not for syntax. The only exceptions being default functions can overload non-default functions and constructors or productions may overload functions.
@jurgenvinju agreed for constructors and syntax -- for functions it is a more complicated story, since we naturally allow a large number of overlapping overloads because of patterns in the signature. This makes it quite challenging to detect an actual overlap. Long term, I need to add pattern coverage checking to see if there are missing or overlapping patterns.
@PaulKlint agreed that it would be nice to have everything in one module, and I also agree that we should wait until the namespaces are separate to do this.
My impression is that this issue is solved.
I believe so as well -- in general, it will look for a concrete version of the type in cases where a) no unqualified type of that name is in scope, and b) it can tell that we attempted to load the name but it would have caused a collision between concrete and abstract type versions. This should prevent "leaking" concrete type names, since we always keep them around in the background so the compiler can use them.
Consider the overloading of
int
as constructor inType
and constructor in the syntax rule below. The type checker forbids this, the evaluator allows it. Who is right? @jurgenvinju @tvdstorm @mahills @Anastassija opinions?[Note there are quite some syntax definitions that depend on this kind of overloading so it would be nice if we can allow it]:
gives [@mahills : I added a print statement and made the error message more informative]