haskell / happy

The Happy parser generator for Haskell
Other
276 stars 84 forks source link

Modularize happy #191

Closed knothed closed 2 years ago

knothed commented 3 years ago

Following https://github.com/simonmar/happy/issues/167#issuecomment-780591344 and #187, this PR aims to split up happy into several components. Two main goals are accomplished:

Ideally, happy programmers should be able to combine happy packages almost arbitrarily: Each frontend should, in theory, be able to work with each middleend, which in turn should work with each backend.

To achieve this goal, we (@sgraf812 and me) have decided to split happy into six packages – five libraries and one executable – as follows:

Why six packages?

We want users to be able to combine different frontend-, middleend- and backend-packages together. Therefore, happy-core is vital as it defines data types that every such package should be able to work with; in particular Grammar, ActionTable and GotoTable.

We'll look at happy-test shortly.

How does a package look like?

The three packages happy-frontend, happy-middleend and happy-backend are all built similarly: they have a public runFrontend/runMiddleend/runBackend function which performs the package's main task. Also, they have a common interface for providing and parsing command line arguments.

Let's take a look at happy–backend (packages/backend). It has the following exposed components:

  1. A main entry point function, runBackend, which performs the code-gen:

    runBackend :: BackendArgs -> Grammar -> ActionTable -> GotoTable -> IO ()
    
    data BackendArgs = BackendArgs {
     outFile :: String,
     templateDir :: Maybe String,
     magicName :: Maybe String,
     strict :: Bool,
     ghc :: Bool,
     coerce :: Bool,
     target :: Target,
     debug :: Bool,
     glr :: Bool,
     glrDecode :: Bool,
     glrFilter :: Bool
    }

    BackendArgs are user-provided arguments, while Grammar, ActionTable and GotoTable come from the frontend and middleend (as results of runFrontend and runMiddleend).

    happy could call runBackend with arbitrary BackendArgs. There is no requirement for a command line interface per se.

  2. But ultimately, we do want to perform happy via a CLI, meaning we want to create BackendArgs from parsing the command line arguments. Just as previously in happy, we define the CLI arguments using GetOpt, but on package-level:

    data BackendFlag =
       OptOutputFile String |
       OptTemplateDir String |
       OptMagicName String |
        ...
    
    backendOptions :: [OptDescr BackendFlag] = [
       Option "o" ["outfile"] (ReqArg OptOutputFile "FILE"),
       Option "t" ["template"] (ReqArg OptTemplate "DIR"),
       Option "m" ["magic-name"] (ReqArg OptMagicName "NAME"),
       ...
    ]
    
    parseBackendFlags :: [BackendFlag] -> String -> IO BackendArgs

So, each package defines its own CLI Flags and options and, in addition, a parseFlags function which converts these CLI flags into PackageArgs.

Then, option parsing takes place as follows:

  1. The main happy executable sticks the options from each package together and calls getOpt.
  2. The resulting Flags are then used to call parseFlags and execute each package's respective main entry point function: runFrontend, runMiddleend and runBackend.

In this way, option parsing is fully modularized such that new packages can define and use their own CLI options.

We just have to pay attention to one thing: with getOpt, it is not possible to have multiple options with the same option character or option string. This means, if there are two different packages which both define an -o option, these cannot be used together in the same happy executable. Package authors have to watch out for such option collisions.

Example: happy-rad

As an example for how this modularization can be used to create a new executable, let's take a look at happy-rad. happy-rad extends happy by a recursive ascent-descent (RAD) backend. It contains two packages:

As happy-frontend, happy-middleend and happy-backend are (will be) remote packages, we can just reuse their functionality. In addition, we get testing for free:

testing happy and happy-rad

How does the happy-test package help package authors in testing their packages?

We've moved all of the existing 26 test grammars (e.g. ParGF.y) into happy-test. happy-test exposes the following function:

test :: TestSetup -> IO a

data TestSetup = TestSetup {
  happyExec :: String, -- e.g. happy or happy-rad
  defaultTests :: [String],
  customTests :: [String],
  customDataDir :: String,
  allArguments :: [String],
  stopOnFailure :: Bool
}

In their test suite, the package author can call test and customize the testing procedure: they can specify which of the 26 grammars should or should not be tested and which argument combinations their executable should be tested with. In addition, they can specify further, custom, testing grammars.

For example, happy-rad could provide [--rad] for the set of arguments it should be tested with, while normal happy could provide [-a, -ag, -cg, -acg].

happy-test can also do something different: The functionality of make sdist-test, which was previously in the Makefile, has now moved into happy-test. This way, every extension author can test their sdists. What does make sdist-test do?

  1. call cabal sdist all to create an sdist of all local packages
  2. unzip the sdists and move them into an umbrella directory
  3. create a cabal.project referring to these local packages
  4. build and test the executable inside this umbrella directory.

More

For the end user of happy, not much changes: the CLI stays identical, the functionality remains the same. Just the compile time of happy increases a bit.

Template Files

As the template files are backend-specific, these are now data-files of happy-backend, not of happy itself. Just like the tests, which are data-files of happy-test instead of happy itself.

Bootstrapping

Bootstrapping is a process which is purely frontend-specific. Therefore, the bootstrap flag only exists in happy-frontend, but still exists. A next step could be to outsource the bootstrapping parser into its own package, as described in https://github.com/simonmar/happy/issues/187#issuecomment-825564616.

CI

Travis and appveyor used make sdist; make sdist-test-only. We have replaced this with our new make sdist-test. make sdist-test currently doesn't work on MinGW.

knothed commented 3 years ago

I decided for the following path:

  1. Split off happy-grammar and happy-tabular
  2. Split off happy-frontend, happy-backend and happy-backend-glr
  3. Create happy and happy-cli
  4. Create happy-test.

Changing all the CLI stuff at once and at the end is the most convenient, both for doing so and for reviewing.

@int-index The first MR is #200 which can be reviewed now.

Maybe it makes more sense to merge all the MRs into an intermediate branch and then merge that branch into master only at the end. Maybe someone could create a branch (e.g. modularisation).

knothed commented 3 years ago

I've removed GenUtils and distributed its members to sensible places: