Open sgraf812 opened 4 years ago
For a live example where this is useful, consider https://gitlab.haskell.org/ghc/ghc/blob/68ddb43c44065d0d3a8a6893f7f8e87f15ee9c1e/compiler/parser/Parser.y#L1278.
We can't use opt_family
and opt_instance
here, because it would give rise to the following parser situation:
-- type family declarations, with optional 'family' keyword
at_decl_cls -> type . opt_family type opt_at_kind_inj_sig
-- default type instances, with optional 'instance' keyword
at_decl_cls -> type . opt_instance ty_fam_inst_eqn
Reduce an empty opt_instance
and you commit to the second production, reduce an empty opt_family
and you reduce to the first one. By attaching some kind of inline information for the parser, the conflict would be resolved without having to do the inlining by hand, as it is currently done.
I want this a lot too for a project of mine whose grammar is quite large, and so I went ahead and took a stab at implementing it. Here are the changes.
%inline
is use-site only (see my change to Calc.ly
for a complete example):
some_rule
: foo bar %inline baz qux { ... } -- `baz` gets inlined here
The current idea is to implement %inline
as a phase to be desugared (similar to parametrized productions).
With the caveat that I've only tried toy examples and done nothing to make the code efficient, it seems to be working as expected. Before I spend more time on this:
@simonmar is this a feature that you think would be worthy of inclusion in Happy (provided a good implementation)?
When inlining, should we duplicate the RHS code of the inlined production at the inline-sites? Should we abstract it out into some other top-level function? I implemented the former for now.
More generally, does the approach I took seem reasonable?
Can we bikeshed the syntax? I'm perfectly happy with only callsite inlining, but we could have a top-level %inline rule
to indicate rule
should always be inlined.
bump, whats the current stances?
Wait for feedback on design and merge-ability I think.
Happy to have this feature, it does sound useful.
Perhaps %inline(nt)
would be a better syntax?
We probably shouldn't duplicate the code, how hard is it to abstract it? Do we run into typing issues?
I haven't reviewed the code, but happy to give it a read once it's settled down. I have no strong attachment to the internals of Happy, it needs a complete rewrite really.
This would be useful in the implementation of modifiers, see https://gitlab.haskell.org/ghc/ghc/-/issues/22624#note_471850
I think with the introduction of %shift
(or was it always there?) and parameterised productions, GHC's parser finally has become conflict-free, so this issue has been far less pressing.
Edit: The following is actually incorrect.
In particular, 0-ary parameterised productions are like the declaration-site %inline
pragma. The OP is about use-site inlining in order to avoid code blowup, so I'm leaving this issue open.
In particular, 0-ary parameterised productions are like the declaration-site %inline pragma.
Are you sure about this? My mental model for parameterised productions was that they were expanded to top-level copies of the template, e.g.
e(t) : t t { ... }
r : e(x) { ... }
| e(y) { ... }
is equivalent to
e_x : x x { ... }
e_y : y y { ... }
r : e_x { ... }
| e_y { ... }
I don't see where inlining enters the picture.
Are you sure about this?
Indeed not, I was wrong. Marking a (use or decl) of a non-terminal as %inline
duplicates its productions into its use sites. That is not the same as a parameterised production at all, although it means the duplicate productions will have a more precise lookahead set, which might absolve of some reduce/reduce conflicts.
It has bothered me for a long time that we have so many conflicts in GHC's parser, and https://gitlab.haskell.org/ghc/ghc/issues/17232 is there to track it. Precedence declarations will probably go a long way to ameliorate this.
But as section 5.3 in http://gallium.inria.fr/~fpottier/menhir/manual.pdf points out, there still are conflicts which aren't easily resolved without compromising on software design by inlining all productions of a non-terminal into a call site.
I imagine something like Menhir's
%inline
keyword would help here. But I'd rather not have it at the declaration site of a non-terminal. Instead, we should have a mechanism that allows us to annotate use-site of the non-terminal, similar to the difference between{-# INLINE #-}
and theGHC.Magic.inline
. Also it would help to resolve conflicts on a case-by-case basis rather than just blindly inline the non-terminal everywhere, which might come with exponential blow-up.