haskell / happy

The Happy parser generator for Haskell
Other
276 stars 84 forks source link

Happy and template haskell #149

Open Ericson2314 opened 4 years ago

Ericson2314 commented 4 years ago

I hope via https://github.com/ghc-proposals/ghc-proposals/pull/243 to make GHC able to use TH, but as GHC is not the sole user of happy, I figured I can get the ball rolling on brainstorming ideas now.

The first change I think would be to allow separating the grammar from the user written code. Happy could turn a .y files without any Haskell in between the braces into its representation of the grammar. Then, in another module, one writes all the production eliminators, named according to some convention. Finally, one imports happy as a library along with that grammar, and has a splice to generate what comes out of .y today.

Now, while the separation of grammar and eliminator seems like classic nice separating concerns, I'll wholly admit that using multiple completely separate ASTs with the same grammar is not likely to be useful in practice---it would be better to write a polymorphic AST and single set of polymorphic production eliminators.

The real benefit is now we have a chance to programmatically modify the grammar before parser codegen. Things like parameterized/macro rules and inlining (#148) could perhaps be done without as much, or any, support from Happy itself. User manipulation of the grammar could also allow it to be nicely split between multiple files, or more broadly, be compositionally constructed.

CC @sgraf812 @harpocrates

int-index commented 4 years ago

I have a prototype of an EDSL to generate Happy grammars:

data ExToken =
  ExTokTrue |
  ExTokFalse |
  ExTokLPar |
  ExTokRPar

data ExRepr = ExPar ExRepr | ExBLit Bool

grammar :: Grammar ExToken (State Int) _
grammar = MkGrammar
  $ tm @"true"  [p|ExTokTrue|]
  $ tm @"false" [p|ExTokFalse|]
  $ tm @"(" [p|ExTokLPar|]
  $ tm @")" [p|ExTokRPar|]
  $ nt @"lit" @ExRepr
    do rule  @'["true"]            [|| \_ -> ExBLit True ||]
       rule  @'["false"]           [|| \_ -> ExBLit False ||]
  $ nt @"expr" @ExRepr
    do ruleM @'["(", "expr", ")"]  [|| \_ e _ -> ExPar e <$ modify (+1) ||]
       rule  @'["lit"]             [|| id ||]
  $ EndOfGrammar

With visible forall in terms, it would look become less noisy:

data ExToken =
  ExTokTrue |
  ExTokFalse |
  ExTokLPar |
  ExTokRPar

data ExRepr = ExPar ExRepr | ExBLit Bool

grammar :: Grammar ExToken (State Int) _
grammar = MkGrammar
  $ tm "true"  [p|ExTokTrue|]
  $ tm "false" [p|ExTokFalse|]
  $ tm "(" [p|ExTokLPar|]
  $ tm ")" [p|ExTokRPar|]
  $ nt "lit" ExRepr
    do rule  ["true"]            [|| \_ -> ExBLit True ||]
       rule  ["false"]           [|| \_ -> ExBLit False ||]
  $ nt "expr" ExRepr
    do ruleM ["(", "expr", ")"]  [|| \_ e _ -> ExPar e <$ modify (+1) ||]
       rule  ["lit"]             [|| id ||]
  $ EndOfGrammar

The next thing is to make happy capable of consuming such a grammar directly, for which the backend needs to change, as it operates entirely on strings. Instead, it could use a combinator library that is capable of producing both String and Exp.

For that, I also have a branch... but I'd need to dust it off. If anyone wants to collaborate on this, let me know.

Ericson2314 commented 4 years ago

Realistically, I'll probably be too busy to do much for a bit (still rushing to get multi-package done for 8.10 but maybe I should let that go; you already saw me get preempted from the pattern binds thing :)). But I would definitely like to collaborate on this long term.

Instead, it could use a combinator library that is capable of producing both String and Exp.

Could we retroactively add an TTG-style extra variant to Exp and friends which is Void by default?

data Exp' a = .... | XExp a
data Exp = Exp' Void

I think that could just squeeze by without a change to GHC itself.

Then one can use the TH pretty printer with your own custom nodes (which just need to be pretty printed). I believe this would be the ideal way to support such a thing.

For that, I also have a branch... but I'd need to dust it off. If anyone wants to collaborate on this, let me know.

Do open a WIP PR branch, even if neither of us has time to complete the dusting!

sgraf812 commented 4 years ago

I had quite a bit of catch up to do, so sorry for the late response.

In general, I'm actually quite happy with the EDSL provided by happy. Sure, the syntax carries historic cruft from 4 decades, but I'm not sure how you generally want to improve on what it does. It's essentially an external -1 stage.

The first change I think would be to allow separating the grammar from the user written code. Happy could turn a .y files without any Haskell in between the braces into its representation of the grammar. Then, in another module, one writes all the production eliminators, named according to some convention.

What kind of "convention" do you have in mind? Just using rule numbers (e.g. reduce42) isn't going to cut it, because that would be fallible to changes/refactorings in the source grammar. You'd need some kind of explicit reference between the grammar file and the production eliminators. And you can do that right now; just immediately call functions in each rule action which are defined in RuleActions or what have you. Parser re-use is possible by backpacking/subsituting with a different module IncrementalRuleActions.

The real benefit is now we have a chance to programmatically modify the grammar before parser codegen.

I don't follow. How is separation of rule actions affecting the grammar? How is this even desirable, given that slight refactorings to the grammar may have non-local effect on the state machine? To me, that's exactly the beauty of keeping the control component opaque: We don't have to worry about how inlining, new syntax etc. changes the state machine. Any attempts to "modify the grammar before parser codegen" risks changing the parse, possibly even the language.

That's exactly why I'm not too fond of parser combinators, by the way: Although intriguingly direct and simple, the programmer directly meddles with the control component and it's hard to tell if the accepted language is still the same after a non-trivial change. I think the reason for my gut feeling is that I rather try to comprehend formal grammars (as they are defined in a language report) rather than code using parser combinators (which necessarily will intertwine hacks and semantic actions with the relevant bits), and it's nigh impossible to make the connection from parser combinators back to the formal grammar. With parser generators, you get all of these guarantees for free!

I have a prototype of an EDSL to generate Happy grammars:

That's nice and all, and will probably attract users that are unfamiliar happy syntax and unwilling to learn the basics. And I'd say it's definitely an improvement over happy syntax and worth a new frontend, if at one point it allows for all the customisation that happy does (operator precedence, custom monads, usage of language extensions inside splices, to name a few). But to me it's rather the same as using happy directly, as far as GHC is concerned: Grammars seldomly fit on one page and it will always be hard to find where an ambiguity stems from. And once your eye is trained to parse happy syntax, I'm compelled to think that reading nt "expr" ExRepr (rule ["(", "expr", ")" ...) is much the same as expr :: { ExRepr } : '(' expr ')' { ... }. The Haskell embedding delivers no added value other than maybe better type-checking and error messages due to usage of phantom types.

The next thing is to make happy capable of consuming such a grammar directly, for which the backend needs to change, as it operates entirely on strings.

I don't understand this point and why it entails changing happy's backend, but I'm also largely unfamiliar with its codebase...

int-index commented 4 years ago

The Haskell embedding delivers no added value other than maybe better type-checking and error messages due to usage of phantom types.

Type-checking the source file instead of the generated file is actually the primary reason that I think an EDSL is superior to a generator.

Ericson2314 commented 3 years ago

@sgraf812 You can think of what I'd like to do with the separate grammars being "grammar combinators" rather than "parser combinators". There's still a complete phase separation between code that manipulates the grammar and the parser itself, so the evalulated grammar gets all the same ambiguity/conflicts checks (which are non-local just like state machine changes).

Ericson2314 commented 3 years ago

@int-index still interested in seeing your branch. I just toyed with https://github.com/simonmar/happy/compare/master...Ericson2314:template-haskell, making me very sad over just how few places Template Haskell allows splices.

int-index commented 3 years ago

@Ericson2314 I just checked a couple of places where my local copy of happy with that branch could be, and couldn't find it, so it must be on the laptop that has spontaneously died a few weeks ago. I may try to recover the data from its SSD at some point, but I wouldn't count on it.

int-index commented 3 years ago

very sad over just how few places Template Haskell allows splices

Indeed, and in my branch I was using custom combinators to produce the code. It didn't look all that good. Maybe we could try extending TH quotes first, so that Haskell has better macro facilities

Ericson2314 commented 3 years ago

@int-index Yeah I think basically writing the code we want to write, and then enumerating the missing features, would be an excellent way to make clear to others some of the biggest problems TH actually faces.

I just worry though that the proper order of yak shaves is to do something about GHC's CI first, as with the current level of productivity GHC developer time is too scarce for a "side quest" of this magnitude :/.