Can't derive a reified type value from a type parameter

usethesource / rascal

The implementation of the Rascal meta-programming language (including interpreter, type checker, parser generator, compiler and JVM based run-time system)

http://www.rascal-mpl.org

Other

399 stars 78 forks source link

Can't derive a reified type value from a type parameter #1212

Open eh-adblockplus-org opened 6 years ago

eh-adblockplus-org commented 6 years ago

I was trying to make a generic test jig to exercise implode on a grammar + ADT definition. What I wanted to do seemed to be an obvious use of type arguments: pass an ADT literal and verify that the result of parse+implode on a test string matches it. Here's the essence of the test jig I wanted, which doesn't work because there doesn't seem to be any way of implementing reifiedTypeOf in Rascal code.

bool testy( type[&T<:Tree] begin, str input, &A<:node adt )
{
    T t = parse( begin, input ) ;
    &A x = implode( reifiedTypeOf(adt), t ) ;
    return x == adt;
}

The reify operator # only works on literals, and typeOf returns type symbols, not reified types. My workaround, FYI, is to abandon type inference and just add a fourth parameter. That's a generic programming fail. The focus of this report is to identify the absence of reifiedTypeOf (or its moral equivalent) and the result deficiency in generic programming. It's a topic that doesn't want to stay isolated, apparently.

eh-adblockplus-org commented 6 years ago

From what I've been able to piece together, this deficiency is a sort-of by design. There's a clue in the documentation:

Note that the typeOf function does not produce definitions, like the unknown: Rascal:Reify operator does, since values may escape the scope in which they've been constructed leaving their contents possibly undefined.

This seems related to #1186, at root of which is what to do about imported syntax rules. The documentation note seems to indicate that the same concerns about undefined symbols resulting from import. So it seems that in order to implement reifiedTypeOf it's going to be necessary to allow it to fail on some types. It should not fail, however, on types with a particular property:

If a type is visible, then the entirety of the type is visible.

Let's call this a "clear type". There are alternatives to this policy, such as "opaque types" that have zero visibility and "obscure types" that are partially visible. This is manifestly possible; it's the principle upon which object-oriented programming is based. The core of Rascal, however, lies with clear types. For example, an opaque syntax rule would make an ambiguity caused by the interaction of such a rule and its importing grammar into one that could be impossible to diagnose. Similarly, it would be impossible to write a grammar whose parse trees could be imploded into an obscure abstract data type, since some constructor names would not be visible to write matching production labels.

Rascal has de facto obscure types now, but not in a principled way, that is, with an explicit declaration, but rather though the action of import, the main (only?) way to get them. Regardless of whether or not Rascal has obscure types either in principle or in practice, they have no place in parse and implode. So it would seem that a first step would be to have a proper definition of a clear type and a predicate function isClearType to check for it. parse and implode should simply throw on non-clear types. And likewise, reifiedTypeOf would only need to work on clear types.

The advantage of this approach is that it defers any kind of decision about how import behaves or about the existence of obscure types.

eh-adblockplus-org commented 6 years ago

It would also be useful to have static assertions (i.e. evaluated at compile-time) to use with isClearType. A module that's intending to declare a clear type should be able to know whether its imported dependencies can allow it.

It should also be said that aliases should enable a type to be considered clear, even if the original definition is out of scope. Not sure if the existing alias syntax is the right thing for that. It's apparent that this doesn't scale well by itself, but it should be available as one mechanism among possibly many.

jurgenvinju commented 6 years ago

Interesting discussion Eric. We will think about the opaqueness of types; it is related to another discussion about the openness of ADT types and syntax types which we are having as well.

For the current test case what you could do is pass in the reified type of &A as a parameter:

bool testy( type[&T<:Tree] begin, str input, type[&A <: node] aType, &A<:node adt )
{
    T t = parse( begin, input ) ;
    &A x = implode( aType, t ) ;
    return x == adt;
}

From my perspective, a reified type is a run-time value, of which the static type is generally type[value] and the dynamic type would be normally type[SomeSyntaxType]. In some very specific cases we can know a more specific static type of a reified type, namely when we use the # operator for which the result type is statically equal to its dynamic type, or when pattern matching in a function parameter, like type[&T <: node] aType will only match reified types which are sub-tupes of node and &A <: node adt will only match (at run-time) if the adt parameter is indeed a sub-type of node.
Inside function with type parameters, the static type system will never know a more concrete type then the literal types mentioned, so &A is a sub-type of node there, but that's it.
Indeed, "reified types" are combined in Rascal with their grammar definitions if the reification is done statically, and not combined if the reification is done dynamically. So #Type includes definitions of Type, but typeOf(var) does not include definitions of the type of var.

Note that the current interpreter simulates a static type system via abstract interpretation while executing the code at the same time. This influences the semantics of reified types as well of course. We did that in anticipation of the "actually static" type checker and not wanting to break too much code when we'd be done with that.

eh-adblockplus-org commented 6 years ago

Added a fourth argument with the reified type was exactly the workaround I used.

Rascal is unusual among programing languages in that it's possible to express a type definition as an ordinary datum within the language. Mostly type definitions are hidden inside the language implementation in some way. In Rascal they're out in the open; that's exactly what a #-reified type is. So while it's possible to create such a datum at run-time, it's not inherently a run-time thing. Analogously, type identifiers are also expressible as ordinary data.

The definition type and the identifier type, however, aren't the same. Both might be called reification in some sense, but it would be good not to overload that meaning too much, at least not without more precise language to refer to the distinctions.

jurgenvinju commented 6 years ago

Thinking out loud here.

run-time/static distinctions

Certainly, a reified type does not act as a normal type anymore, after it is reified. They are just a particular kind of values, not interacting much with the type system at all. Same for the grammar definitions, once they are reified as values, that's all they really are. Then there are certain operations/function which act on such values, such as the parser and parser generator, which take a "reified type" (a top symbol and a grammar), generate a parser from that and such. The build function is another such function which can construct an arbitrary ADT value given a dynamic grammar value.

Caveat emptor: all those 'type-value-processing' and 'grammar-processing' functions do not provide much static guarantees, they throw run-time exceptions in case of type errors. Same for the functions you write yourself based on these type values.

What I like very much about the type[&T] type, for reified type values, is that at that point is the small integration point where the static type system and the reified types "touch", allowing the programmer to reason about a typed return value (if an exception is not thrown). This was the leap for me at least, to integrate dynamic type/grammar with the Rascal static type system. It's very much akin to how class literals are used in Java and later in Scala.

Names for concepts

A better name for "reified type" and "reified grammar" would be great. In F# they talk about "type providers", which is too complex for my taste and not as powerful at the same time.

I don't want to introduce more type kinds for Rascal though, even though we should have more names. This is enough already, and I think type[&T] is ok for the type kind for reified types.

What could be possible is to introduce two more alternative constructors for reified types: type[&T] = type(Symbol sym) | grammar(Symbol sym, map[Symbol, Production] grammar) which would make the distinction between an opaque type and a transparantly defined type explicit.

jurgenvinju commented 6 years ago

@PaulKlint you have also thoughts on the names of reified types. Symbol was our traditional name ever since the 90's, but we should probably move to something else soon right?

eh-adblockplus-org commented 6 years ago

I have belatedly realized that there are indeed sufficient syntactic elements in Rascal to implement the generic function in the example code. Instead of the expression reifiedTypeOf(adt), the following expression has all the right semantic elements.

#type[typeOf(adt)]

It doesn't work (of course?), though. If the error message I've seen is accurate, typeOf, defined in the library Type, is not recognized as a type identifier for the structured type expression type[].

I should point out that it is possible to fully resolve the above expression at compile-time. That doesn't mean it could be at present. typeOf is defined in a library, not as an intrinsic. To my knowledge Rascal does not have a mechanism to require compile-time resolution of expressions (such as C++ constexpr), so my expectation is that typeOf executes at run-time, subverting the possibility of compile-time evaluation of the above expression.

jurgenvinju commented 6 years ago

I think you might be looking for this:

type[value] myType = type(\int(),());

or:

if (type[Expression] myType := type(adt("Expression,[]), ())) {
   ...
}

type(Symbol,map[Symbol,Production) is the builtin binary adt constructor for all values which have a type which is a type of type[value].
The first argument is the value representing the type
The second argument is the (possibly empty) grammar
This is also how the result of #int is represented, but you can construct your own reified types using this constructor
The dynamic type of type(S, _) is type[X] where X is the type represented by S
The static type of the expression type(S, _) is always type[value]

jurgenvinju commented 6 years ago

The # operator takes only closed type expressions as parameter (including type parameters), but no value expressions.
The type constructor is its value-based friend, taking a symbolic type value and constructing a "typed" reified type from it.

This stuff better end up in the documentation sometime :-)

eh-adblockplus-org commented 6 years ago

I'm not looking for anything other than what's in the opening comment of this issue. Specifically, that's a way of getting from a type parameter declared in a function to a reified type that I can pass to parse or implode or other library functions that require them. Whatever technique qualifies must not require any specific knowledge about the interior of its type, for example its grammar, because the purpose is to support generic programming. To put a fine point on it, I'd like to know how to implement type[&A] reifiedTypeOf( &A ) in Rascal code, where the return value of type type[&A] is sufficient for parse etc.

I don't believe this is possible with the current state of Rascal. This issue has been labelled "question", which isn't really right. I'm reporting what I consider a deficiency in the type system of Rascal. It's not a defect; there's nothing here behaving other than the way it's defined. Rather it's something that can't be done at present. In my opinion, this should be labelled "deficiency".

All the discussion about static/dynamic aspects is relevant to the current state of Rascal, but ultimately it's not part of the deficiency. To my eye it certainly seems possible to have a completely static (compile-time) version of reifiedTypeOf, but I don't consider that to be a part of the deficiency reported. It might count as a (second) deficiency that a run-time technique has lower performance than a compile-time one, but for this issue strictly construed, it's not necessary.

I do completely agree that the whole type system needs better documentation. There's mention in this thread of static type and dynamic type, but there's no proper definitions of these in the documentation. Indeed, the "Concepts" section has a page "StaticTyping", so it's not even clear from the documentation that there is such a thing as a dynamic type.

Some of the type system needs initial documentation. Researching all this, I saw that there's a typeCast function. I was already using its idiom, but the function itself is not documented.

jurgenvinju commented 6 years ago

I think we're stuck here indeed:

All values do know their run-time type for pattern matching purposes and making sure some rank-2 polymorphic functionality (like visit) can find out what the current type is,
but they just don't have access to their own declaration context at run-time (I wouldn't know how to do that anyway without breaking referential integrity or structural equality). So they do not know "their" grammar;
We might consider reifying the grammar of the current declaration context of the generic function, but that would be rather meaningless since that context is generic.
Values are supposed to travel freely between declaration contexts;
Ergo: stuckness on getting grammars from run-time values.

In my mind the current design walks the tight rope pretty well, you do get access to grammars (when reified in their static context), and you can build grammars dynamically yourself.

I consider the lesson learned here that conflating the reified type notion and the grammar notion can be confusing; it raises expectations which can not be met.

For later reference, the following function typechecks and does something "reasonable", but it does not generate a reified type including a grammar for use with parse:

type[&A] reifiedTypeOf(&A a) {
   if (type[&A] result := type(typeOf(a), ())) {
     return result;
   }
   else {
     throw "could not construct a type for the type of <a> (<typeOf(a)>)";
   }
}

jurgenvinju commented 6 years ago

Note to self: This sentence is important: "a way of getting from a type parameter declared in a function to a reified type that I can pass to parse", so: a dynamic conversion from a static type to a dynamic value representing the actual type bound to the type parameter".

It's like the typeOf function in Type.rsc, but it would also construct a grammar (somehow) for the type of the parameter next to the symbolic representation of its run-time type.

A workaround for not having this feature remains to always pass in an already reified type as a parameter. This is in principle always possible, but not always handy. Where we call the generic function we have a concrete idea of what &A is, and so also type[&A] can be computed using #MyType if &A would be bound to the ADT MyType.

Alas, this workaround bubbles up to the top most concrete function at the top of the call graph (in case we have generic functions which call other generic functions). We have to add reified type parameters to all intermediate levels.

eh-adblockplus-org commented 6 years ago

All values do know their run-time type for pattern matching purposes [...] but they just don't have access to their own declaration context at run-time [...] So they do not know "their" grammar [...]

An arbitrary (data-declared) value, that is, a value as such, doesn't even have a unique grammar of origin. A given value is de facto simultaneously a value of an infinity of possible Rascal types, a cofinite number of which will never be written down. Put more concretely, if a value matches all of the salient data definitions, it's a member of that type. For example, recall my comment in #1211 that the empty set is simultaneously a value within every set type.

And now more abstractly, the set of grammars that accept a particular concrete value form a semi-lattice. At the bottom of the lattice is the grammar that accepts only that value itself, a grammar defining a type consisting of a single value. (That is, up to isomorphism, depending on how you define grammar equivalence.) So in one sense every value does have a canonical grammar, but that grammar is canonically isomorphic to the value itself, so it yields no new information.

All this says is that it's the wrong approach to try to derive types from values. All is not lost, however, because what does have a type is the expression that generated the value passed to the function. In the simplest case this is simply a variable, and the type of that expression is the declared type of the variable. That's the type that needs to be used for type inference in a generic function. I haven't dug into how Rascal implements generic functions, but it seems as though the full type of the expression is getting truncated prematurely somewhere along the way.

I might also mention that the type of an expression might not be explicitly declared or easily derivable in explicit form from the expression itself. An example of what I'm thinking of is a heterogeneous list, where the type is a list of the least upper bound type of all the types of the list members. It would be adequate to fail in this circumstance for an initial implementation, where the maturity of the system is insufficient for full generality, unable to construct type expressions under all circumstances.

jurgenvinju commented 6 years ago

Well, Rascal's type system is not context-sensitive and an actual (not reified) type does not ever comprise a grammar during type checking or code generation, or interpreting. The constructors of grammars are used as constructor function declarations only, and otherwise only the names of ADTs are used for type checking expressions such as function calls.

What you called a "full type" only exist in the reified type world as the result of the # operator, so values which contain symbolic representations of types and a symbolic representation of grammars. Actual non-reified types, as used in the type checked and at run-time, do not have grammars ever. So grammars are not in play when implementing generic functions at all.

Side note: you should know when thinking about Rascal functions, that type compatibility for function calls in Rascal is different than in other languages. Parameter types must only be type comparable (formal <: actual || actual <: formal) and not per se sub-types (actual <: formal) due to the semantics of pattern matching (e.g. the value of an expression with static type value, may dynamically match against the int type). For higher order functions this implies to co- and contra-variance for parameter types. Type checking call sites is not complete due to these weaker constraints on parameter types; there may be type correct call sites which result in a run-time exceptions due to none of the alternatives matching at run-time. Rewrite rule completeness properties can be checked by other algorithms than simple type-checking, but we haven't included anything like that.

To change the type system such that types do carry grammars, would imply a complex change to the type system which I'm not willing to contemplate for now, to be honest. Too complex imo. It would imply context-sensitivity if I'm not mistaken, because different call sites of a generic function will lead to different "grammar-types", and both type checking and code generation would be context-sensitive as a result. Maybe I'm not thinking this through enough, but from here it does not seem to be worth the effort, especially considering we can pass in a reified type as a parameter to a generic function if so needed.

On the design of Rascal though, it could be worth to make sure that values of type type[&T] indeed represent nothing but reified types (no implicit grammars), and then introducing another grammar[&T] type kind for typing grammar values. That would remove the implicit conflation of reified types and grammars at the cost of introducing yet another type kind. grammar[&T] would behave as type[&T] does now, but type[value] typeOf(value x) would be the new profile of typeOf, cast would still be &T cast(type[&T] t, value x). We might consider grammar[&T] <: type[&T] such that #A would be of type grammar[A] <: type[A], and parse would change to &T parse(grammar[&T] gr, str x). This would clarify why typeOf can not be used to construct a value for the first argument of parse. But I haven't thought through all the consequences of these changes... definitely worth a different discussion with @PaulKlint as well, and possibly a Rascal Amendment Proposal.