haskell / happy

The Happy parser generator for Haskell
Other
276 stars 85 forks source link

Split out happy-codegen-common #221

Closed Ericson2314 closed 2 years ago

Ericson2314 commented 2 years ago

There was some extra information stuff in Grammar which had nothing to do with the grammar, but was simply there because Grammar was also playing the role of capturing all information from the abstract syntax.

I don't think that's good. As we really try to really make libraries out of this stuff we should be stricter and stricter about separating concerns. Grammar should really just be that, the grammar, and the code in happy-tabular should not be privy to information that is just for the backend.

Splitting out a code generation CommonOptions type is a first step to rectifying this. I hope we can do a few more refactors like this to really make each data type shine on its own.


I don't want to sound to harsh though, since we purposely made the split low impact with more cleanups -- such as this -- left for later. It is of course easier to see what's good and what's bad once the code is split up!

Also, this change calls into question my previous BookendedAbsSyn. We're now acknowledging the abstract syntax mixes "middle" and back end concerns, and those are not properly separated until the nest step. Given that, there isn't much use of making BookendedAbsSyn when we could just stick the header and footer in CommonOptions.

int-index commented 2 years ago

I’m fine with splitting out a data type, but I think that making it a separate package is overkill. Nothing stops us from making a separate package for every data type and every function, and then we could be really explicit about what depends on what, but there’s a cost to this: uploading more packages to Hackage, maintaining correct version bounds, cluttering the dependency graph for end-users, etc.

I actually think the separation between happy-grammar and happy-tabular is also gratuitous. We don’t have use cases where one would be used without the other.

There’s a trade-off between fine-grained and coarse-grained dependencies, and I’d like our decisions to be informed by (at least imaginary) use cases:

happy-grammar can be easily merged into happy-tabular. Yes, that would mean that happy-frontend depends on more definitions than the code in it requires, but: would anyone ever use happy-grammar without happy-tabular? I don’t think so.

int-index commented 2 years ago

To elaborate a bit more on this, packages are a means of code distribution, and should be motivated by distribution needs.

As long as we ship happy as a single binary, it’s fine to have everything in a single package. I want to ship a TH-based version, so I’d like to factor out anything related with .y-files and CLI into their own packages, so that TH users wouldn’t depend on those components.

The backends go into their own packages because we could easily ship happy-lalr, happy-glr, and happy-rad as separate binaries, and I think most users would just pick and use one of those.

Now, what this patch is doing is related to separation of concerns rather than distribution. And I’m more than happy to separate concerns: dedicated functions, data types, and modules, are all very good (that is also the stated motivation: “make each data type shine on its own”). But there’s no need for separate .cabal packages.

Ericson2314 commented 2 years ago

@int-index I agree we don't want to go overkill on the packages, and indeed it's already unwieldy. But, I do think there is some utility in ripping things into too many pieces on purpose, just so you have a clean slate and more flexibility to decide to decide how to to put them back together.

I agree with your breakdown of concerns and survey of potential distributions. I agree too it is likely happy-grammar and happy-tabular should be recombined, But I also don't think that while this type certainly doesn't deserve it's own package, it doesn't belong in any of the others existing currently either.

As far as concerns go, this stuff is really backend concerns that need an interface so we show-horn them in the frontend. The core middle parts of the compiler (which I think are destined to become the most general-purpose librar(y|ies)) don't care about them at all.

As far as distributions go, I think use-case: along these lines are useful and plausible:

  1. Work with plain grammars, not ornamented with fused elimination rules in the YACC tradition.
  2. Something that just says diagnostics about your grammar, like --info and doesn't actually compiler it into anything. Hell, we could have a full-on language server!
  3. Interpret a grammar. Dispense with the phase restrictions and just interpret a (possibly ornamented) grammar right into a Haskell function. Slow, but might have advantages. Also can do some ridiculously context-dependent shenanagins.

All of these would avoid the frontend, avoid the CLI, avoid the backends, and avoid this Directives type. In other words they would just use happy-gramar and happy tabular.

I think a likely happy ending for happy-directives is for it to evolve into some sort of happy-backend-common. (And I am also thinking s/backend/codegen because codegen is intrinsic to the interface whereas backend is merely how it relates to other parts of existing use-cases.) This would contain Directives (which should be something like CommonCodegenOptions), and also your new syntax-building combinators. This makes it a meatier library more worthy of existing!

How does that sound?

int-index commented 2 years ago

How does that sound?

Yes, sounds reasonable overall. But if you put the directives in happy-codegen-common, then happy-frontend will depend on it (since it needs to produce a grammar+directives that it parser from .y), and that would be quite strange.

Ericson2314 commented 2 years ago

I think that's not so bad, because the current frontend is specific to the use-cases that do codegen.