haskell / happy

The Happy parser generator for Haskell
Other
286 stars 84 forks source link

Optional bootstrapping #175

Closed Ericson2314 closed 3 years ago

Ericson2314 commented 3 years ago

This is an alternative to #170 that allow building happy with or without bootstrapping.

We discussed this with @simonmar elsewhere and concluded that a better solution (for now) would be to check in Happy-generated files.

The rational here is hidden to me, but I'm hoping that this might be of interest, in that we are both keeping the theoretical niceness of bootstrapping available, while also offering an escape hatch as a first step towards radically simplifying the build system / testing infra of everything happy related. I think the latter is extremely important, because anything at all custom very much hampers the ability of tools like hadrian and haskell.nix that want to plan and execute a complete bootstrap with as little manual intervention as possible.

This is WIP because I need to update CI.

Closes #169

CC @angerman @hsyl20

int-index commented 3 years ago

The rational here is hidden to me

The rationale is basically eating your own dog food. #170 worked well, but it meant that happy wouldn't be making use of its own features.

Now this pull request (#175) means we have two implementations at once. I'm afraid that the cure might be worse than the disease, as it will be difficult to keep the two parsers in sync. And it increases the amount of code (thus the amount of potential bugs).

Here are the two options that I think are better:

  1. Reopen #170. Then we sacrifice dogfooding, but we get a nice bootstrapping story and inspectability.

  2. Run happy manually once on its own .y-grammar, then add an automatic check on CI that the checked-in generated file is up to date. This way we get dogfooding, but we also get a huge unreadable happy-generated piece of code checked-in. The only problem I see here is that happy output is unreadable, so it's not principally different from checking-in a blob.

So if we favor inspectability, then #170 is the way to go, because I think my bespoke parser combinators turned out alright. On the other hand, if we favor dogfooding, then we should git add the happy-generated parser.

It's a judgement call. Personally, I favor inspectability more (hence #170). @simonmar told me:

I'm not sure I'm convinced that we should eliminate bootstrapping, but I don't feel all that strongly. Use your best judgment.

And I didn't execute on that because I'm not that sure that my judgement is correct here. Maybe your vote will determine the outcome here, @Ericson2314 :-)

harpocrates commented 3 years ago

@int-index does #170 produce the same parsing code that happy would've produced by itself? If so, there could be a third option which I think gets the best of both worlds:

  1. reopen #170, get the nice bootstrapping and inspectability
  2. in CI, after bootstrapping happy once, run it on its own .y-grammar to confirm that the result of bootstrapping is no different than what would've been obtained from running happy itself
Ericson2314 commented 3 years ago

@int-index I think I am still in favor of this approach.

Anything that involves mandatory bootstrapping, without a keeping around build using all happy/alex versions back to the beginning I consider a "coinductive bootstrap". We try to find a fixed point, and cross our fingers it's the right one. By contrast, with #175 allows easy "inductive bootstrap", where you can always boot from the parser combinator version, and therefore compare any fixed point to a "grounded" reference.

For reference, I'm a big fan of @taktoa's GHC desiderata in https://gist.github.com/taktoa/a59400fd3e1c400835b60c416ad33952. I view happy and alex also sort of a perfect microcosm of the GHC universe, where we have a chance at checking off these idealistic boxes at far lower cost.

The only problem I see here is that happy output is unreadable, so it's not principally different from checking-in a blob.

I agree completely.

Now this pull request (#175) means we have two implementations at once. I'm afraid that the cure might be worse than the disease, as it will be difficult to keep the two parsers in sync.

But think of it as tests! We have in order of clarity: happy output < parser combinators < happy input. So the most auditable spec is the happy input, but the most auditable build artifact is the parser combinator version.

And it increases the amount of code (thus the amount of potential bugs).

Ah, but we should consider parallel vs serial composition, like with electrical resistance. Bigger implementation is more bugs, but this is 2 implementations. I say this is parallel composition, and thus, like compositing resisters in parallel, means fewer bugs.


High-minded reasoning aside, let me offer some concessions:

  1. All that said, even if it is fewer bugs, it is more. This is something I really want, so I volunteer to bare the burden of any additional pain associated with this. If we go with this, and I go AWAL, feel free to delete one.

  2. The two paths don't need to be feature-set identical. We just need enough functionality to do the inductive bootstrap, so the parser combinator one can have fewer features if need be. Not sure that helps now, but it might help later.

int-index commented 3 years ago

OK, I'm convinced.

Ericson2314 commented 3 years ago

:) Thanks for hearing me out. I suppose I'll wait longer to push the button so @simonmar can weigh in.

int-index commented 3 years ago

The two paths don't need to be feature-set identical. We just need enough functionality to do the inductive bootstrap, so the parser combinator one can have fewer features if need be. Not sure that helps now, but it might help later.

Should we drop the attribute grammar support from the parser combinators implementation, then?

Ericson2314 commented 3 years ago

@int-index I like it!

One small caviet, it might be hard to get Hadrian and Make to do the bootstrap without introducing a stage -1 (ugh), so if GHC uses the attribute grammar stuff I might be tempted to wait to remove it, but if it doesn't then yes let's by all means get rid of it.

int-index commented 3 years ago

GHC doesn't use it.

simonmar commented 3 years ago

Just out of interest, why not use ReadP instead of hand-rolling parser combinators?

int-index commented 3 years ago

Just out of interest, why not use ReadP instead of hand-rolling parser combinators?

Originally I expected these parser combinators to replace bootstrapping entirely, so I wanted them to be more efficient. Performance considerations influenced their design significantly.

But with the new design where they are used to facilitate bootstrapping instead of replacing it, maybe performance isn't important at all. So ReadP could be used, too.

Ericson2314 commented 3 years ago

So what say you both that we just merge this so we can also merge #179 and then worry about the ReadP simplification, having a nice from-source bootstrap to iterate on?

int-index commented 3 years ago

Yeah, doing one thing at a time sounds reasonable to me. Would you mind squashing the commits before the merge?