haskell / happy

The Happy parser generator for Haskell
Other
276 stars 84 forks source link

Support inlining of non-terminals, as Menhir does #148

Open sgraf812 opened 4 years ago

sgraf812 commented 4 years ago

It has bothered me for a long time that we have so many conflicts in GHC's parser, and https://gitlab.haskell.org/ghc/ghc/issues/17232 is there to track it. Precedence declarations will probably go a long way to ameliorate this.

But as section 5.3 in http://gallium.inria.fr/~fpottier/menhir/manual.pdf points out, there still are conflicts which aren't easily resolved without compromising on software design by inlining all productions of a non-terminal into a call site.

I imagine something like Menhir's %inline keyword would help here. But I'd rather not have it at the declaration site of a non-terminal. Instead, we should have a mechanism that allows us to annotate use-site of the non-terminal, similar to the difference between {-# INLINE #-} and the GHC.Magic.inline. Also it would help to resolve conflicts on a case-by-case basis rather than just blindly inline the non-terminal everywhere, which might come with exponential blow-up.

sgraf812 commented 4 years ago

For a live example where this is useful, consider https://gitlab.haskell.org/ghc/ghc/blob/68ddb43c44065d0d3a8a6893f7f8e87f15ee9c1e/compiler/parser/Parser.y#L1278.

We can't use opt_family and opt_instance here, because it would give rise to the following parser situation:

-- type family declarations, with optional 'family' keyword
at_decl_cls -> type . opt_family type opt_at_kind_inj_sig
-- default type instances, with optional 'instance' keyword
at_decl_cls -> type . opt_instance ty_fam_inst_eqn

Reduce an empty opt_instance and you commit to the second production, reduce an empty opt_family and you reduce to the first one. By attaching some kind of inline information for the parser, the conflict would be resolved without having to do the inlining by hand, as it is currently done.

harpocrates commented 4 years ago

I want this a lot too for a project of mine whose grammar is quite large, and so I went ahead and took a stab at implementing it. Here are the changes.

With the caveat that I've only tried toy examples and done nothing to make the code efficient, it seems to be working as expected. Before I spend more time on this:

cartazio commented 4 years ago

bump, whats the current stances?

harpocrates commented 4 years ago

Wait for feedback on design and merge-ability I think.

simonmar commented 4 years ago

Happy to have this feature, it does sound useful.

Perhaps %inline(nt) would be a better syntax?

We probably shouldn't duplicate the code, how hard is it to abstract it? Do we run into typing issues?

I haven't reviewed the code, but happy to give it a read once it's settled down. I have no strong attachment to the internals of Happy, it needs a complete rewrite really.

googleson78 commented 1 year ago

This would be useful in the implementation of modifiers, see https://gitlab.haskell.org/ghc/ghc/-/issues/22624#note_471850

sgraf812 commented 18 hours ago

I think with the introduction of %shift (or was it always there?) and parameterised productions, GHC's parser finally has become conflict-free, so this issue has been far less pressing.

Edit: The following is actually incorrect. In particular, 0-ary parameterised productions are like the declaration-site %inline pragma. The OP is about use-site inlining in order to avoid code blowup, so I'm leaving this issue open.

int-index commented 15 hours ago

In particular, 0-ary parameterised productions are like the declaration-site %inline pragma.

Are you sure about this? My mental model for parameterised productions was that they were expanded to top-level copies of the template, e.g.

e(t) : t t { ... }

r : e(x) { ... }
  | e(y) { ... }

is equivalent to

e_x : x x { ... }
e_y : y y { ... }

r : e_x { ... }
  | e_y { ... }

I don't see where inlining enters the picture.

sgraf812 commented 12 hours ago

Are you sure about this?

Indeed not, I was wrong. Marking a (use or decl) of a non-terminal as %inline duplicates its productions into its use sites. That is not the same as a parameterised production at all, although it means the duplicate productions will have a more precise lookahead set, which might absolve of some reduce/reduce conflicts.