Reified parameterized concrete syntax

tedinski commented 8 years ago

This is a longish-term feature request. I thought I'd written about it before, but I just had reason to look for it and couldn't find it.

It would be nice to support reifying parameterized nonterminals when generating copper specs. That is, we could have a library of things like:

nonterminal ConcreteMaybe<a> with ast<Maybe<Typeof_ast<a>>> where Concrete<a>, Has_ast<a>;
concrete productions top::ConcreteMaybe<a>
| val::a  { top.ast = just(val.ast); }
| { top.ast = nothing(); }

The general idea being, we can abstract out common patterns (such as EBNF and beyond features) into parameterized nonterminals in a library.

A few notes about this example:

I'm assuming some kind of typeclass-like system where I've specified that a has to be Concrete. And indicating that a has the attribute ast. And also using a type function to get the type of ast on a. So this is assuming some sophisticated type machinery (a solution to 'the attribute access problem')
We need to reify this for copper, to eliminate all the parameterized types. There's actually already a start on this! Sort of. Copper specs (inside Silver) already re-organize themselves so that a nonterminal's productions appear below the nonterminal in the tree. This makes it easier to generate a new nonterminal with a modified copy of those productions.
Note that this reification process is iterative, and there are no termination guarantees. We need to think about how to handle that.
There's more thought needed about allowing type parameters to concrete nonterminals. The parser can construct trees according only to the syntax, so we'd need to somehow constrain things so that the types aren't constraining valid trees in such a way that parsing yields type incorrect trees.
Another hard thing: dealing with layout. Issue #37 suggests what I still think is a superior way of doing layout compared to right now, but we'd have to figure out how to handle that as part of reification. (One possible approach would be to "pass down" the same layout from the production that spawned a reified nonterminal to that nonterminal, unless the nonterminal has explicit layout set on it.)

There's a ton of things that could end up in such a library. Maybe with and without an ast. Lists of 0 or 1 or more elements. Lists: separated, terminated, or separated with optional dangling termination. Possibly some combinators like concatenation (useful when passing in to other combinators.)

tedinski commented 8 years ago

melt-umn/copper#3 suggests another possible common pattern: wanting to eagerly consume input as long as it remains possible. (That is, supporting a combinator that does something like maximal-munch does for lexing.)

It's not entirely clear whether (or how) this would fit into this sort of library-based system. It's possible that we'd need some extra capabilities from Copper. (Such as an ability to set an operator on a production not to a particular terminal but "higher than" or "lower than" a particular terminal. Also not clear to me whether the implications of this are hazardous or not. For instance, this might no longer yield a total order of precedence but a lattice.)

schwerdf commented 8 years ago

Copper does not need, nor does it use, a total ordering for operator precedence. It currently supports operator precedence classes, similar to what it used to have for lexical precedence before we switched to the fully general relation in 2007. The only reason we did not switch the operator precedence over at the same time is that it was too much bother for too little gain.

For a machine-generated spec, one can exercise precise control over shift-reduce conflict resolution by simply declaring pseudo-terminals for use in operator precedence.

krame505 commented 2 years ago

I think we now have everything needed in Silver to make this fairly straightforward - attribute occurs-on type constraints cover having ast (and also unparse, probably?) on a, and fixes to layout in #347 should make this just work as expected, I think.

As Ted outlined, this would be implemented as a pass on the concrete syntax AST after normalization, that iteratively monomoprhizes all polymorphic nonterminals, duplicating their associated productions. We would also have to statically solve all production type constraints (which would be fully instantiated, e.g. attribute ast<Expr> {} occurs on Optional_Expr) for the translation.

Note that this reification process is iterative, and there are no termination guarantees. We need to think about how to handle that.

This can happen e.g. with a production Foo<a> ::= Foo<Foo<a>>. It would probably be sufficient to error out if monomorphization doesn't finish after a set number of iterations.

There's more thought needed about allowing type parameters to concrete nonterminals. The parser can construct trees according only to the syntax, so we'd need to somehow constrain things so that the types aren't constraining valid trees in such a way that parsing yields type incorrect trees.

I think for this it would be sufficient to just ban existential type vars (i.e. type vars that don't appear in the production's LHS) in concrete syntax. This would still allow for all of the variations that we might want in a library, as Ted described above.

We could even still allow GADT productions (where the LHS has non-var type parameters, e.g. Expr<Integer> ::= Int_t) by filtering out any prods where the LHS type doesn't match during monomorphization. This (might?) be enough to let one actually do limited a version of the "well typed interpreter" in Silver (although application would need to be manually monomorphisized for all permitted arg types, since production app :: (Expr<b> ::= Expr<(b ::= a)> Expr<a>) has a as existential. Implementing type checking in your parser is still probably not a good language design practice anyway, but this might be useful e.g. for a tool that always works on type-correct IR code.)

A couple more thoughts:

We should probably add a convenience extension to add Expr *, Expr + and Expr ? type expressions forwarding to ZeroOrMore<Expr>, OneOrMore<Expr> and ZeroOrOne<Expr>.
The "easy terminal" convenience extension currently allows quoted terminals as top-level production signature items. We should probably allow them as arbitrary type expressions instead, to allow things like ','? in a production signature.

krame505 commented 2 years ago

One final note - I would consider this a prerequisite to dividing Silver into separate abstract and concrete syntax, if we ever decide to indeed go that route in the future (not that I'm convinced that this is worth the trouble, though.) The Silver grammar has a lot of repeated modifiers, optional keywords and bits of declarations, etc. and being able to express that in concrete syntax without the need for lots of dedicated "list"/"optional" nonterminals would be make that refactor (slightly) less painful.

ericvanwyk commented 2 years ago

Is this something that can be done as a "proper" extension? I'm hesitant to add more to Silver right now.

melt-umn / silver

Reified parameterized concrete syntax #71