haskell / happy

The Happy parser generator for Haskell
Other
276 stars 85 forks source link

Extract frontend work into frontend-cli #219

Closed knothed closed 2 years ago

knothed commented 2 years ago

This PR extracts both the CLI flags which are used by the frontend and the actual work which is performed by the frontend from Main to frontend-cli.

The following things have changed:

The CLI options from the different packages (currently happy and frontend-cli) are stitched together in Main, and then sorted so that they start with the o, i and p options:

> data HappyFlag = MainFlag CLIFlag | FrontendFlag Happy.Frontend.CLI.Flag deriving Eq
>
> argInfo :: [OptDescr HappyFlag]
> argInfo = beginOptionsWith "oip" $
>     map (fmap MainFlag) mainArgs ++
>     map (fmap FrontendFlag) Happy.Frontend.CLI.options
Ericson2314 commented 2 years ago

I would rather not the change happy-grammar like that. I don't like threading through unmodified arguments out in the returned value, as I think it is leads to tangling down the road.

It is possible we could modify AbsSyn so that that there is a core data type for the middle two fields, and then a separate data type for that and also the header and footer "bookends". Then happy grammar can just take the inner thing as a parameter, demonstrating that it doesn't need to know about the header and footer.

Correct me if I am wrong, @int-index, but the TH version of happy also wouldn't even bother with such a header and footer, right? If so, then the alternative change to AbSyn I propose above would also be advantageous to it.

int-index commented 2 years ago

I’m actually fine with including hd and tl in the Grammar, it’s simpler than adding a parameterized type. We’ll have our share of parameterization when we add support for TH, as we’ll need to use Exp, Pat, Type, and [Dec] in various places that are currently just String.

I’d rename those fields to header and footer though.

int-index commented 2 years ago

What surprises me about this patch is that it introduces happy-frontent-cli instead of just happy-cli. The goal is to be able to parameterize happy by various backends, so I’d expect the following:

  1. Create an abstraction for a backend:

    data Backend = ...

    I’m not entirely sure what would need to go in there, but the first step is to design a common interface for functions such as produceParser and produceGLRParser, so that produceParser could be a field in the Backend structure.

  2. Parameterize main in Main.lhs by a list of available backends, so that we have

    main = mainWithBackends [lalrBackend, glrBackend]
    
    mainWithBackends :: [Backend] -> IO ()

The current patch seems to do something very different from this idea.

Ericson2314 commented 2 years ago

@int-index

it’s simpler than adding a parameterized type.

err I don't want to add a parameterized type, but do something like:

data AbsSyn
   = AbsSyn
        [Directive String]   -- directives
        [Rule]               -- productions

data BookendedAbsSyn
   = BookendedAbsSyn
        (Maybe String)       -- header
        AbsSyn
        (Maybe String)       -- footer

The parser returns a BookenedAbsSyn, but the "manger" only takes an AbsSyn. This demonstrates that header and footer are not needed by middle parts, and just slapped back on by the backend.

Ericson2314 commented 2 years ago

What surprises me

I think @knothed's idea was that the frontends are still a bit free form, but we can parse the flags for different components separately. I think I am OK with that which is a bit less ambitious than the higher-order thing you imagined, where every happy variant would need to work basically exactly the same.

I might just make a separate PR with my BookendedAbsSyn thing, as I think it is good aside from CI stuff.

int-index commented 2 years ago

err I don't want to add a parameterized type, but do something like:

OK, I thought you meant

data Bookended a
   = Bookended
        (Maybe String)       -- header
        a
        (Maybe String)       -- footer
    deriving (Functor, Foldable, Traversable)

And then traverse (mangler file) :: Bookended AbsSyn -> m (Bookended Grammar).

Because as far as I can tell, the idea is to bundle the header and the footer with the grammar.

Ericson2314 commented 2 years ago

https://github.com/simonmar/happy/pull/220 is the PR.

Oh I see what you mean. Yeah don't like bundling the header and footer with the grammar, since they naturally come apart after parsing.

What this might mean is that we ought to split the frontend package, to separate the parser + BookendedAbsSyn from mangler + AbsSyn? That is overkill for the CLI, but I believe justified if the mangler is something the TH lib would use.

knothed commented 2 years ago

The current patch seems to do something very different from this idea.

Yes, I wanted to separate the cli out of every component that has some CLI-interface, leading to happy-frontend-cli, happy-tabular-cli, happy-backend-cli and happy-backend-glr-cli; the stitching of CLI arguments is then done inside Main, so users can stitch packages together arbitrarily, not only restricted to swapping the backend. I don't know if that is sensible though.

int-index commented 2 years ago

Yes, I wanted to separate the cli out of every component that has some CLI-interface

It would be hard to get this right because we don’t actually have use cases to validate such a design. Splitting into packages is motivated by two directions of work that I know of:

So I wouldn’t split into too many packages unless we have a use case to guide us.

knothed commented 2 years ago

Parameterize main in Main.lhs by a list of available backends, so that we have

   main = mainWithBackends [lalrBackend, glrBackend]

   mainWithBackends :: [Backend] -> IO ()

What approach would you take here?

Different backends will probably want to use different CLI arguments. For example, the GLR backend currently uses the following flags: -o, -t, -k, -f, -g, -d, while the LALR backend uses these ones: -o, - t, -m, -s, -g, -c, -a, -d. There are flags which are used by both backends and others which are not.

Therefore, what I've thought of was that the Backend data type would include a list of OptDescrss (or an abstraction of it) which are then added to the options which are used by getOpt inside main. This way, each backend can declare and later access precisely the flags that it wants.

This just gets problematic if multiple backends declare the same flag (like -o) but with different option descriptions. Then happy wouldn't know what to show when the user asks for --help.

There would also be one or more further top-level flags which determine the backend to actually use.

This approach seems a bit clumsy, but I'm not sure how to improve on it without losing the flexibility of new backends declaring arbitrary CLI flags.

int-index commented 2 years ago

what I've thought of was that the Backend data type would include a list of OptDescrs (or an abstraction of it)

Right, I think it should be an abstraction over it, not an OptDescr itself, because the backend shouldn’t know anything about the CLI interface. I’d like to keep everything CLI-related in the happy-cli package.

This approach seems a bit clumsy, but I'm not sure how to improve on it without losing the flexibility of new backends declaring arbitrary CLI flags.

It’s a nice goal to support arbitrary options in the backends, but a hard nut to crack. I’d start with a hard-coded list of available options (a union of options supported by all backends), and each individual backend would simply advertise what subset it supports. Here’s a rough sketch:

  1. Create a package happy-backend-interface which defines the Backend data type, for which happy-backend-* packages provide implementations.
  2. In happy-backend-interface, create another data type data CodeGenFlag = CgfGhcExts | CgfArrayTarget | ... that is a union of all options of all backends. Yes, that means that adding a new backend with novel options might require extending this ADT, which is far from perfect, but also not a big deal since new backends are a rare addition. You can put the type in the happy-tabular package
  3. The happy-backend-* packages export a function configureBackend with the type Set CodeGenFlag -> Either (Set CodeGenFlag) Backend. It returns Right in case the backend supports all the provided options, and Left with the unsupported options if it doesn’t support them.
  4. In happy-cli, transform CodeGenFlag to OptDescr CLIFlag.

EDIT: following my reasoning in https://github.com/simonmar/happy/pull/221#issuecomment-1012672381, instead of happy-backend-interface we could just introduce these types somewhere in happy-tabular

int-index commented 2 years ago

I’d like to hear @Ericson2314’s take on this, too, because I’m not sure what the best design would be.

Ericson2314 commented 2 years ago

I am not sure how I feel about hard-coded list of options to sub-set.

I just did this Rust PR for work (https://github.com/LedgerHQ/cargo-ledger/pull/5/files) it shows a nice thing where the CLI parser is largely derived from the CLI AST, and the CLI AST is at least somewhat composable.

I think making separate CLI args packages for the packages happy-rads uses it not unreasable -- we do have the use-case to validate.

It might be best to clean up some the data structures more / massage things after our minimal-diff split before doing in any of this stuff. I am sorry about delaying the CLI stuff even longer but I think it is just good to make everything else really nice to so have a higher baseline quality in happy by which to judge more controversial CLI changes.

Ericson2314 commented 2 years ago

To make that last bit, I just opened https://github.com/simonmar/happy/pull/221. We should really thoroughly critique each data structure and make the libraries worthy of being called libraries. After a few round of that I think I think we'll be in better shape to think about the end applications.