Modularize happy - Githubissues

knothed commented 3 years ago

Following https://github.com/simonmar/happy/issues/167#issuecomment-780591344 and #187, this PR aims to split up happy into several components. Two main goals are accomplished:

Introducing a clear and sensible separation between different parts of happy like frontend, middleend and backend.
Making it easy for anyone to extend happy or to create and distribute new, possibly experimental, happy-features and -components (like a TemplateHaskell-frontend or a recursive ascent-descent-backend).

Ideally, happy programmers should be able to combine happy packages almost arbitrarily: Each frontend should, in theory, be able to work with each middleend, which in turn should work with each backend.

To achieve this goal, we (@sgraf812 and me) have decided to split happy into six packages – five libraries and one executable – as follows:

happy-frontend is responsible for reading .y-files and parsing them into the Grammar datatype.
happy-middleend transforms the Grammar into an ActionTable and a GotoTable.
happy-backend takes the these tables and generates template-based Haskell code.
happy-core defines the interfaces (IRs) between the frontend, middleend and backend packages. Also, it contains core utilities like option parsing.
happy-test hosts the test files and contains logic to execute the test suite.
happy itself just puts the above packages together and contains little additional logic.

Why six packages?

We want users to be able to combine different frontend-, middleend- and backend-packages together. Therefore, happy-core is vital as it defines data types that every such package should be able to work with; in particular Grammar, ActionTable and GotoTable.

We'll look at happy-test shortly.

How does a package look like?

The three packages happy-frontend, happy-middleend and happy-backend are all built similarly: they have a public runFrontend/runMiddleend/runBackend function which performs the package's main task. Also, they have a common interface for providing and parsing command line arguments.

Let's take a look at happy–backend (packages/backend). It has the following exposed components:

A main entry point function, runBackend, which performs the code-gen:
```
runBackend :: BackendArgs -> Grammar -> ActionTable -> GotoTable -> IO ()

data BackendArgs = BackendArgs {
 outFile :: String,
 templateDir :: Maybe String,
 magicName :: Maybe String,
 strict :: Bool,
 ghc :: Bool,
 coerce :: Bool,
 target :: Target,
 debug :: Bool,
 glr :: Bool,
 glrDecode :: Bool,
 glrFilter :: Bool
}
```
BackendArgs are user-provided arguments, while Grammar, ActionTable and GotoTable come from the frontend and middleend (as results of runFrontend and runMiddleend).

happy could call runBackend with arbitrary BackendArgs. There is no requirement for a command line interface per se.

But ultimately, we do want to perform happy via a CLI, meaning we want to create BackendArgs from parsing the command line arguments. Just as previously in happy, we define the CLI arguments using GetOpt, but on package-level:

data BackendFlag =
   OptOutputFile String |
   OptTemplateDir String |
   OptMagicName String |
    ...

backendOptions :: [OptDescr BackendFlag] = [
   Option "o" ["outfile"] (ReqArg OptOutputFile "FILE"),
   Option "t" ["template"] (ReqArg OptTemplate "DIR"),
   Option "m" ["magic-name"] (ReqArg OptMagicName "NAME"),
   ...
]

parseBackendFlags :: [BackendFlag] -> String -> IO BackendArgs

So, each package defines its own CLI Flags and options and, in addition, a parseFlags function which converts these CLI flags into PackageArgs.

Then, option parsing takes place as follows:

The main happy executable sticks the options from each package together and calls getOpt.
The resulting Flags are then used to call parseFlags and execute each package's respective main entry point function: runFrontend, runMiddleend and runBackend.

In this way, option parsing is fully modularized such that new packages can define and use their own CLI options.

We just have to pay attention to one thing: with getOpt, it is not possible to have multiple options with the same option character or option string. This means, if there are two different packages which both define an -o option, these cannot be used together in the same happy executable. Package authors have to watch out for such option collisions.

Example: happy-rad

As an example for how this modularization can be used to create a new executable, let's take a look at happy-rad. happy-rad extends happy by a recursive ascent-descent (RAD) backend. It contains two packages:

rad-backend: a backend which produces RAD code. This package is completely independent of happy-backend. Still, it works with the same data types (Grammar, ActionTable, GotoTable) as happy-backend.
happy-rad: it depends (a.o.) on happy-frontend, happy-middleend, happy-backend (which are remote packages) and rad-backend (local package). First, frontend and middleend are executed normally. Then, depending on whether the --rad option is set, either rad-backend or happy-backend is invoked.

As happy-frontend, happy-middleend and happy-backend are (will be) remote packages, we can just reuse their functionality. In addition, we get testing for free:

testing happy and happy-rad

How does the happy-test package help package authors in testing their packages?

We've moved all of the existing 26 test grammars (e.g. ParGF.y) into happy-test. happy-test exposes the following function:

test :: TestSetup -> IO a

data TestSetup = TestSetup {
  happyExec :: String, -- e.g. happy or happy-rad
  defaultTests :: [String],
  customTests :: [String],
  customDataDir :: String,
  allArguments :: [String],
  stopOnFailure :: Bool
}

In their test suite, the package author can call test and customize the testing procedure: they can specify which of the 26 grammars should or should not be tested and which argument combinations their executable should be tested with. In addition, they can specify further, custom, testing grammars.

For example, happy-rad could provide [--rad] for the set of arguments it should be tested with, while normal happy could provide [-a, -ag, -cg, -acg].

happy-test can also do something different: The functionality of make sdist-test, which was previously in the Makefile, has now moved into happy-test. This way, every extension author can test their sdists. What does make sdist-test do?

call cabal sdist all to create an sdist of all local packages
unzip the sdists and move them into an umbrella directory
create a cabal.project referring to these local packages
build and test the executable inside this umbrella directory.

More

For the end user of happy, not much changes: the CLI stays identical, the functionality remains the same. Just the compile time of happy increases a bit.

Template Files

As the template files are backend-specific, these are now data-files of happy-backend, not of happy itself. Just like the tests, which are data-files of happy-test instead of happy itself.

Bootstrapping

Bootstrapping is a process which is purely frontend-specific. Therefore, the bootstrap flag only exists in happy-frontend, but still exists. A next step could be to outsource the bootstrapping parser into its own package, as described in https://github.com/simonmar/happy/issues/187#issuecomment-825564616.

CI

Travis and appveyor used make sdist; make sdist-test-only. We have replaced this with our new make sdist-test. make sdist-test currently doesn't work on MinGW.

knothed commented 3 years ago

I decided for the following path:

Split off happy-grammar and happy-tabular
Split off happy-frontend, happy-backend and happy-backend-glr
Create happy and happy-cli
Create happy-test.

Changing all the CLI stuff at once and at the end is the most convenient, both for doing so and for reviewing.

@int-index The first MR is #200 which can be reviewed now.

Maybe it makes more sense to merge all the MRs into an intermediate branch and then merge that branch into master only at the end. Maybe someone could create a branch (e.g. modularisation).

knothed commented 3 years ago

I've removed GenUtils and distributed its members to sensible places:

mapDollarDollar to Grammar as it is a grammar-level feature and every backend working with Grammar uses it
dieHappy and friends to Happy.CLI.Dying. This makes Happy.*.CLI modules import Happy.CLI.Dying, which means frontend and backend now also depend on happy-cli. If this is unwanted, frontend and backend could outsource their CLI module into frontend-cli and backend-cli.
The remaining members are put directly in the packages where they're used.

haskell / happy

Modularize happy #191

Why six packages?

How does a package look like?

Example: happy-rad

testing happy and happy-rad

More

Template Files

Bootstrapping

CI