haskell / happy

The Happy parser generator for Haskell
Other
276 stars 85 forks source link

Modularize happy #191

Closed knothed closed 2 years ago

knothed commented 3 years ago

Following https://github.com/simonmar/happy/issues/167#issuecomment-780591344 and #187, this PR aims to split up happy into several components. Two main goals are accomplished:

Ideally, happy programmers should be able to combine happy packages almost arbitrarily: Each frontend should, in theory, be able to work with each middleend, which in turn should work with each backend.

To achieve this goal, we (@sgraf812 and me) have decided to split happy into six packages – five libraries and one executable – as follows:

Why six packages?

We want users to be able to combine different frontend-, middleend- and backend-packages together. Therefore, happy-core is vital as it defines data types that every such package should be able to work with; in particular Grammar, ActionTable and GotoTable.

We'll look at happy-test shortly.

How does a package look like?

The three packages happy-frontend, happy-middleend and happy-backend are all built similarly: they have a public runFrontend/runMiddleend/runBackend function which performs the package's main task. Also, they have a common interface for providing and parsing command line arguments.

Let's take a look at happy–backend (packages/backend). It has the following exposed components:

  1. A main entry point function, runBackend, which performs the code-gen:

    runBackend :: BackendArgs -> Grammar -> ActionTable -> GotoTable -> IO ()
    
    data BackendArgs = BackendArgs {
     outFile :: String,
     templateDir :: Maybe String,
     magicName :: Maybe String,
     strict :: Bool,
     ghc :: Bool,
     coerce :: Bool,
     target :: Target,
     debug :: Bool,
     glr :: Bool,
     glrDecode :: Bool,
     glrFilter :: Bool
    }

    BackendArgs are user-provided arguments, while Grammar, ActionTable and GotoTable come from the frontend and middleend (as results of runFrontend and runMiddleend).

    happy could call runBackend with arbitrary BackendArgs. There is no requirement for a command line interface per se.

  2. But ultimately, we do want to perform happy via a CLI, meaning we want to create BackendArgs from parsing the command line arguments. Just as previously in happy, we define the CLI arguments using GetOpt, but on package-level:

    data BackendFlag =
       OptOutputFile String |
       OptTemplateDir String |
       OptMagicName String |
        ...
    
    backendOptions :: [OptDescr BackendFlag] = [
       Option "o" ["outfile"] (ReqArg OptOutputFile "FILE"),
       Option "t" ["template"] (ReqArg OptTemplate "DIR"),
       Option "m" ["magic-name"] (ReqArg OptMagicName "NAME"),
       ...
    ]
    
    parseBackendFlags :: [BackendFlag] -> String -> IO BackendArgs

So, each package defines its own CLI Flags and options and, in addition, a parseFlags function which converts these CLI flags into PackageArgs.

Then, option parsing takes place as follows:

  1. The main happy executable sticks the options from each package together and calls getOpt.
  2. The resulting Flags are then used to call parseFlags and execute each package's respective main entry point function: runFrontend, runMiddleend and runBackend.

In this way, option parsing is fully modularized such that new packages can define and use their own CLI options.

We just have to pay attention to one thing: with getOpt, it is not possible to have multiple options with the same option character or option string. This means, if there are two different packages which both define an -o option, these cannot be used together in the same happy executable. Package authors have to watch out for such option collisions.

Example: happy-rad

As an example for how this modularization can be used to create a new executable, let's take a look at happy-rad. happy-rad extends happy by a recursive ascent-descent (RAD) backend. It contains two packages:

As happy-frontend, happy-middleend and happy-backend are (will be) remote packages, we can just reuse their functionality. In addition, we get testing for free:

testing happy and happy-rad

How does the happy-test package help package authors in testing their packages?

We've moved all of the existing 26 test grammars (e.g. ParGF.y) into happy-test. happy-test exposes the following function:

test :: TestSetup -> IO a

data TestSetup = TestSetup {
  happyExec :: String, -- e.g. happy or happy-rad
  defaultTests :: [String],
  customTests :: [String],
  customDataDir :: String,
  allArguments :: [String],
  stopOnFailure :: Bool
}

In their test suite, the package author can call test and customize the testing procedure: they can specify which of the 26 grammars should or should not be tested and which argument combinations their executable should be tested with. In addition, they can specify further, custom, testing grammars.

For example, happy-rad could provide [--rad] for the set of arguments it should be tested with, while normal happy could provide [-a, -ag, -cg, -acg].

happy-test can also do something different: The functionality of make sdist-test, which was previously in the Makefile, has now moved into happy-test. This way, every extension author can test their sdists. What does make sdist-test do?

  1. call cabal sdist all to create an sdist of all local packages
  2. unzip the sdists and move them into an umbrella directory
  3. create a cabal.project referring to these local packages
  4. build and test the executable inside this umbrella directory.

More

For the end user of happy, not much changes: the CLI stays identical, the functionality remains the same. Just the compile time of happy increases a bit.

Template Files

As the template files are backend-specific, these are now data-files of happy-backend, not of happy itself. Just like the tests, which are data-files of happy-test instead of happy itself.

Bootstrapping

Bootstrapping is a process which is purely frontend-specific. Therefore, the bootstrap flag only exists in happy-frontend, but still exists. A next step could be to outsource the bootstrapping parser into its own package, as described in https://github.com/simonmar/happy/issues/187#issuecomment-825564616.

CI

Travis and appveyor used make sdist; make sdist-test-only. We have replaced this with our new make sdist-test. make sdist-test currently doesn't work on MinGW.

Ericson2314 commented 3 years ago

@knothed Still trying to find the time to properly sit down and review this, but I wanted to reaffirm I'm very happy to see this PR!

knothed commented 3 years ago

@Ericson2314 perfect, I‘m quite happily looking forward to hearing your opinion!

sgraf812 commented 3 years ago

I didn't review much code, because I trust you on not having modified the core happy logic. Plus, the tests seem to pass, so your refactoring seems to be successful.

I quite like the new package organisation, but ultimately it's up to the maintainers to decide.

Edit: Although, I wonder: Did CI really run the tests? I can see no output difference between https://travis-ci.org/github/simonmar/happy/jobs/774274988 (where they should run) and https://travis-ci.org/github/simonmar/happy/jobs/774274987 (where they shouldn't run).

Ericson2314 commented 3 years ago

For CI maybe try GitHub workflows. See Alex, which has made the switch, though I think is prefer a manually written one like I attempted in https://github.com/simonmar/alex/pull/190 as last I checked haskell-ci didn't support macOS with GitHub workflows yet.

Ericson2314 commented 3 years ago

Ah just merge in https://github.com/simonmar/happy/pull/196 and get it working with what you got. Hopefully we can merge that one soon and fix CI so this can be safe to merge.

knothed commented 3 years ago

Edit: Although, I wonder: Did CI really run the tests? I can see no output difference between https://travis-ci.org/github/simonmar/happy/jobs/774274988 (where they should run) and https://travis-ci.org/github/simonmar/happy/jobs/774274987 (where they shouldn't run).

Yes. In the second link, the tests have been run normally, while in the first link, make sdist-test was performed. Compare e.g. the number of lines from the output.

sgraf812 commented 3 years ago

Ah just merge in #196 and get it working with what you got

Hmm... Isn't GHA-based CI an entirely different issue? As I see it, Travis CI seems to work with this PR: https://travis-ci.org/github/simonmar/happy/builds/774274977. And it indeed runs the tests.

Ericson2314 commented 3 years ago

@knothed Oh when I clicked "all checks have passed" Travis was not there, so I assumed it was broken. Nevermind, then.

Ericson2314 commented 3 years ago

OK I would like this to get merged soon :). It is very good and has sat open too long.

A few more things

  1. Can we put happy itself in packages/ too? This will make filtering the sources per package from the repo easier, because every package gets its own subtree.
  2. I would be open to giving the swappable packages names based on the algorithm they currently use. e.g. maybe we would switch the default backend from this one to a different one at some point :).
  3. Can the attribute grammar stuff leverage this some how? I see you have isBootstrapped to leak out whether to do the attribute grammar tests.
Ericson2314 commented 3 years ago

If you check the "maintainers can push to PR" box I can help fix conflicts.

Ericson2314 commented 3 years ago

Also https://github.com/piknotech/happy/pull/1

knothed commented 3 years ago

OK I would like this to get merged soon :). It is very good and has sat open too long.

Nice, I'd also like that :)

  1. Can we put happy itself in packages/ too? This will make filtering the sources per package from the repo easier, because every package gets its own subtree.

Yes we could do that. This would however require all of happys data-files to be inside this subdirectory. These include the readme, changelog, makefile etc., which would be much better placed in the top directory instead of being inside packages/happy. We could probably solve this using links? But I don't know whether that is cross-system-proof.

  1. I would be open to giving the swappable packages names based on the algorithm they currently use. e.g. maybe we would switch the default backend from this one to a different one at some point :).

Sounds good - what would you propose?

  1. Can the attribute grammar stuff leverage this some how? I see you have isBootstrapped to leak out whether to do the attribute grammar tests.

Bootstrapping is a frontend-specific thing, and the frontend exposes isBootstrapped. The other packages don't care about bootstrapping.

Using happy with attribute grammars requires the bootstrapped happy however which is why the attribute grammar tests are listed separately inside Tests.hs.

Ericson2314 commented 3 years ago

We could probably solve this using links? But I don't know whether that is cross-system-proof.

I'm told windows support for symlinks is continuously improving. And I think git had some tricks even before that, so that's fine with me.

Ericson2314 commented 3 years ago
  1. I would be open to giving the swappable packages names based on the algorithm they currently use. e.g. maybe we would switch the default backend from this one to a different one at some point :).

Sounds good - what would you propose?

I guess first I would need to understand how/why your happy rad still uses regular happy-backend.

Ericson2314 commented 3 years ago

Well isBootstrapped should be called something like supportsAttributeGrammars for starters. The former is an implementation detail, the latter isn't.

knothed commented 3 years ago

I guess first I would need to understand how/why your happy rad still uses regular happy-backend.

My rad-backend is independent of happy-backend. The executable happy-rad however does include regular happy-backend so the user can decide which backend they wanna use.

knothed commented 3 years ago

Should we remove the TODO file? It seems outdated.

Ericson2314 commented 3 years ago

I guess first I would need to understand how/why your happy rad still uses regular happy-backend.

My rad-backend is independent of happy-backend. The executable happy-rad however does include regular happy-backend so the user can decide which backend they wanna use.

Oh! So it could be skipped to force them into the rad backend? That's great news.

Ericson2314 commented 3 years ago

Could you make a happy-backend-glr? And make it optional just like you made happy-rad support two backends?

Perhaps that is a better splitting than whatever I was thinking of with the attribute grammars :).

knothed commented 3 years ago

Yes, I‘ve also thought about splitting the GLR backend off. I‘ll do that.

It could however be that the GLR backend has a dependency to the normal backend, as it uses, I think, some of the normal backend logic. But that shouldn’t be a problem – the normal backend (obviously) doesn’t depend on the GLR backend.

knothed commented 3 years ago

Oh! So it could be skipped to force them into the rad backend? That's great news.

Exactly!

Ericson2314 commented 3 years ago

https://github.com/simonmar/happy/blob/1ad42c11b24ca6bc67720489a21fc7e0aa23dcfe/packages/backend/src/Happy/Backend/ProduceGLRCode.lhs actually looks like it just uses core!

Ericson2314 commented 3 years ago

It seems the attribute grammar does have a backend component, after all, or at least there is some ugly string concatenation in the frontend mangler that look like it.

But this is no fault of the current PR, so I'm happy to deal with it post merge :)

knothed commented 3 years ago

cabal check wants license and maintainer fields for each package. I've just copied most fields from the original happy package, if that's okay.

I've also added some imperfect descriptions to the packages, maybe you could take a quick look at them.

knothed commented 3 years ago

GitHub actions works now (https://github.com/piknotech/happy/actions). I've manually updated haskell-ci.yml and haskell-ci.patch as haskell-ci doesn't seem work on Mac with GitHub workflows, so someone should probably test the haskell-ci.yml file generation again.

Also, could you approve the GitHub actions to run here in this repo? As a first-time contributor I need a maintainer to approve running workflows.

knothed commented 3 years ago

cabal check wants license and maintainer fields for each package. I've just copied most fields from the original happy package, if that's okay.

Maybe this would be an opportunity to name the new maintainers, as currently only Simon Marlow is present in the cabal files.

knothed commented 3 years ago

@Ericson2314 The GLR-backend is now split off into a separate package. It is however (and has previously not been) covered by any tests.

Are you happy with the cabal files?

knothed commented 3 years ago

happy-rad now even features three backends at once: normal, glr and rad. It calls the correct one depending on whether the --glr or --rad flags are set.

This multi-package option parsing can be seen here. Could be helpful for future package developers.

int-index commented 3 years ago

I love the idea, but this is impossible to review. Can you split it into several parts?

You could first factor out happy-core into its own package as a separate PR, then happy-middleend, then happy-backend, etc, each in its own incremental PR.

Also, I disagree a little with the way you have split this into packages, in particular regarding happy-core and happy-middleend. Command-line options are a part of the command-line interface and definitely do not belong to happy-core, whereas routines to transform Grammar into (ActionTable, GotoTable, [Lr1State], [Int]) can be easily put into the same package as Grammar, ActionTable, and GotoTable.

So I think that:

Ericson2314 commented 3 years ago

@int-index I do not like the principle of putting CLI stuff everywhere either, but it does seem awfully convenient for whipping up exes like @knothed's happy-rad

int-index commented 3 years ago

it does seem awfully convenient for whipping up exes like @knothed's happy-rad

True, hence the suggestion to put CLI utilities into their own package happy-cli

Ericson2314 commented 3 years ago

it does seem awfully convenient for whipping up exes like @knothed's happy-rad

True, hence the suggestion to put CLI utilities into their own package happy-cli

So it's OK if happy-frontend and happy-backend depend on those?

int-index commented 3 years ago

So it's OK if happy-frontend and happy-backend depend on those?

No. What is there in packages/core/src/Happy/Core/OptionParsing.hs (in the current patch) that the frontend or the backend would use?

Ericson2314 commented 3 years ago

Oh, I see packages/*/src/Happy/*/CLI.hs doesn't actually use that stuff.

edit Are you OK with that stuff being there since it is still morally CLI stuff even if it doesn't use that module?

Ericson2314 commented 3 years ago

Anyways, I like the happy-cli idea, it doesn't resolve the tensions but it is an improvement.

I would like to keep middleend separate, however. It is a bad name, but I do think it is useful to separating a yet-to-be-named middle stage of compilation from the stuff everything needs.

Ultimately, I don't think happy is written with modularity in mind, so it's better to do some split, erring on the side of more granularity, and then refactor things to be sane after. (I do not want to do a release right after this PR is merged.)

int-index commented 3 years ago

Are you OK with that stuff being there since it is still morally CLI stuff even if it doesn't use that module?

Are you referring to packages/*/src/Happy/*/CLI.hs? No, I don’t think it belongs there either. But it’s more tolerable, because there’s no immediate use case that would suggest doing otherwise.

I plan to use happy-core as part of my TTH-based EDSL, and it definitely has nothing to do with CLI. On the other hand, something like happy-backend would only be used without CLI if we planned to support a Happy GUI or something, which is less likely. (But if it happens – we will need to think about moving CLI stuff out).

I do think it is useful to separating a yet-to-be-named middle stage of compilation from the stuff everything needs.

Okay, but let’s think about concrete use cases. What library or app would ever require Grammar and ActionTable / GotoTable but not the function to transform from one to another?

Ericson2314 commented 3 years ago

We're on same page for the CLI stuff now.

For the fate of middleend, I thought one could convert a Grammar to the tables and then dispense with the grammar, but the imports seem to indicate otherwise. That pulls the foundation out from underneath that the middle stuff is a pipeline stage at all.

(To recap, the structure I generally like would be ASTs/IRs 0..n+1, and transformations 0..n, depending online on the input and output ASTs/IRs.)

sgraf812 commented 3 years ago

FWIW, initially we thought about a split of just frontend > middleend > backend and middleend was all about turning Grammar descriptions into ActionTables. But then we discovered that all of them need CLI functionality and share types. It would've been awkward to put them all in frontend, so we wrote core to put the shared stuff there.

Now it might seem that middleend doesn't do much and we could move it into core, but then frontend and middleend implicitly depend on how we translate Grammars to ActionTables... Note there might be other ways to write a buildTables function. Maybe that is OK?

CLI stuff ended up in core because we didn't want to introduce yet another package, but happy-cli is ultimately the better solution.

Ericson2314 commented 3 years ago

OK it seems all the backends really do use the original Grammar meaningfully.

If that is something that is reasonable to fix (by instead passing in the residual information that is not duplicate with that in tables) I would have happy-grammar and happy-tabular together with happy-cli replace happy-core.

The final breakdown would be:

Tables -> happy-tabular GenUtils -> happy-grammar, but should be liquidated Grammar -> happy-grammar NameSet -> inline to happy-middleend, redefine in happy-rad OptionParsing -> happy-cli

Backends would depend on happy-grammar and happy-tabular until that situation is rectified.

If, on the other hand, it is not reasonable to keep Grammar away from the backends, then I would further combine happy-middleend into happy-tabular. (The backends would morally depend on how the tables were projected, but the fronted wouldn't depend on the tables.) That is closer to what @int-index said, except the happy-grammar is still separate for the frontend to depend on without happy-tabular.

How does that sound to everyone? Take your time responding, I need to go to bed :).

knothed commented 3 years ago

I'm fine with the division @int-index and @Ericson2314 proposed - splitting happy-core seems sensible.

OK it seems all the backends really do use the original Grammar meaningfully.

They do, and I don't think it's sensible to try changing that by introducing another IR type that the backends should then use instead of Grammar. There is lots of information inside Grammar that is used by code-gen that is not represented in the tables, and all of it is required for a correct code-gen. The only fields in Grammar which are not backend-relevant are expect and priorities.

This would mean that happy-middleend would be merged into happy-tabular, which is okay. Users who want to write additional optimisation passes based on the generated tables could still do so, they don't need happy-middleend for that.

All of happy-middleends CLI-options would just be merged into the frontend, and the frontend would perform the IO-tasks that the middleend did so far: info file generation and debug dumping.


What would you suggest to improve the situation with the CLI? I do think that it is sensible for each package to expose its own CLI-options. When the CLI-options are not per-package, but central (i.e. defined by the executable itself), we lose the possibility to easily combine packages without redefining the CLI handling every single time.

A possibility would be to split happy-package into itself plus happy-package-cli, but that immediately leads to package flooding. I believe it's best as it currently is.

Note that a package can always be used without using its CLI options, so a GUI could e.g. just call the backend by creating BackendArgs, without CLI interplay.


Also, I'll be on vacation for the next 4 weeks from now on, so all of you feel free to perform some patches to this PR yourself, given enough spare time and/or dedication. I'll still answer on comments, I just can't write any code. 😄

Ericson2314 commented 3 years ago

All of happy-middleends CLI-options would just be merged into the frontend, and the frontend would perform the IO-tasks that the middleend did so far: info file generation and debug dumping.

I wouldn't do that, because then happy-frontend gains the happy-tabular dep. I would just leave those in happy-tabular for now, and we can move them out later.

What would you suggest to improve the situation with the CLI?

Fine as is. The individual packages don't use that CLI module that was in core, so we side-step it for now. When we do the TH edsl we can consider doing the -cli package explosion as @int-index implied (as I interpreted, at least).

cdsmith commented 3 years ago

Not a productive comment, but I just wanted to say that I'm really happy to see this, as I was looking for a happy parser that I could use to partially automate a labor-intensive conversion from GHC's Haskell grammar. This looks like just what I need. I'm quite eager to see it merged.

Ericson2314 commented 3 years ago

@cdsmith On the contrary it's great motivation to here this stuff doesn't just feel like a good idea, but has concrete immanent use-cases something laudable like code world (I presume :))!

Ericson2314 commented 3 years ago

Welcome back, @knothed :). @int-index what do you think of my last proposal? I was actually expecting us to decide on something first before @knothed implemented it — hopefully it's more too many more hoops to jump through!

int-index commented 3 years ago

Yeah, it’s fine by me.

But as I said, this current PR changes too much at once, I find it difficult to review. Could we do it gradually? For example, start by separating out happy-grammar into its own package.

knothed commented 3 years ago

Perfect, I’ll try splitting up the MR as sensibly as possible. :)

Would you like each MR to actually build and work for itself? That would be a bit challenging considering the changes to the parsing architecture.

int-index commented 3 years ago

Would you like each MR to actually build and work for itself?

Yes.

That would be a bit challenging considering the changes to the parsing architecture.

All the more reason to do things separately. Consider starting with changes to the parsing architecture before moving things around -- that would make it easier to review the new architecture as a proper diff

Ericson2314 commented 3 years ago

I suppose the first PR can establish the subdirectory regime (the moves will not clutter the diff), and then the subsequent ones are more simple.

knothed commented 3 years ago

Hmm, wouldn’t moving the source files into different directories break the single-package happy, which expects them in src/?

Ericson2314 commented 3 years ago

I just mean the first PR can adopt the directory structure of this PR, but with just the two packages happy and happy-grammar.