softdevteam / grmtools

Rust grammar tool libraries and binaries
Other
508 stars 31 forks source link

Allow leading "|" in first alternative of a production #129

Closed pablosichert closed 1 year ago

pablosichert commented 4 years ago

It would be nice if it was allowed to place a leading | in the first alternative of a production.

E.g.:

Production -> [...]:
 |  Foo { [...] }
 |  Bar { [...] }
 |  Baz { [...] }
;

As far as I'm aware this wouldn't change any semantics, as also the ε-alternative is required to have a body {}.

ltratt commented 4 years ago

For Yacc and Bison compatibility, we can't do this, because grammars don't have to have actions (e.g. for https://docs.rs/lrpar/0.4.3/lrpar/struct.RTParserBuilder.html#method.parse_generictree). If we were to do this, these two grammars would be impossible to parse unambiguously:

A: B | C;

vs.

A: | B | C;

Does A have an epsilon rule or not?

benblank commented 1 year ago

Apologies for necroing this issue, but it describes what I want exactly, so I wasn't sure creating a new one would be appropriate. 🙂

I'm fairly new to grmtools (and, in fact, Yacc in general) and find the "inconsistent" leading whitespace on this type of rule to be somewhat irritating, particularly when rearranging productions. Having to format the first production specially (no vertical bar) is much the same kind of annoyance as having to format the last element of an array/vector/list specially in languages which do not allow trailing commas. And for grmtools, it even feels like a bit of a backwards step when compared to the roughly-equivalent Rust syntax (match branches, which do allow a vertical bar in front of the first pattern).

I had a few thoughts on how this might be accomplished without creating ambiguity:

  1. Introduce an %explicit-empty (or similar) declaration Using this declaration would then require empty branches to use an %empty declaration, allowing A: | B | C; to be easily distinguished from A: %empty | B | C;. Anyone wanting to use the "ignored initial vertical bar" syntax described in this issue would have to opt in to it and explicitly accept responsibility for keeping the grammar unambiguous.
  2. Allow (and merge) "duplicate" rules I have found a few instances (1, 2) of the vertical bar being described as syntactic sugar for repeating the rule's LHS multiple times. In other words, the following would be considered equivalent:

    A : B
    | C;
    A : B;
    A : C;

    lrpar doesn't currently allow the second form (YaccParserError { kind: DuplicateRule, line: 2, col: 1 }), but introducing it would provide a way to make sure your productions both line up nicely and are easily reorderable, though at the expense of extra verbosity.

  3. Only ignore an initial vertical bar if there is no production on the line which introduces the rule I like this alternative the least, both because it "feels" fragile to me and adds significant whitespace. It would, however, prevent ambiguity:

    // Empty production
    A: | B | C;
    // No empty production
    A:
    | B
    | C;

Any of these could also only be supported when using YaccKind::Grmtools to additionally reduce the possibility of introducing conflicts with other Yacc grammars.

Do any of these alternatives sound workable?

ltratt commented 1 year ago

I didn't know about %empty -- I think the Grmtools YaccKind should grow support for this.

I'm a bit unsure about duplicate rules. Is that syntax supported by Yacc? If so, we should consider adding support for it (possibly including, by default, a warning that users can explicitly turn off?).

An issue I can see with %explicit-empty is that grammars will then no longer be copy-and-pastable between Yacc/Bison and grmtools. Worse, the grammars will look copy-and-pastable, but they'll behave differently. I wish Yacc had chosen a different syntax back in the day, but I think we'll cause as many problems by trying to tweak it as by living with it. Perhaps the only way out of this rat hole would be to have a noticeably different variant grammar syntax so that people couldn't confuse things. I don't think that's a route I'd want to go down any time soon, though!

benblank commented 1 year ago

Hmm. I found the duplicate rules syntax documented for both Sun's Yacc and Bison, but didn't think to test whether either actually accepts that syntax or whether the docs were just using it as an analogy. I can certainly check Bison easily enough, and I'll see whether I have a way to check Sun's Yacc as well.

Are YaccKind::Grmtools grammars copy-and-pasteable between Yacc/Bison and grmtools now? I was under the impression that including return types in the "rule name" already broke compatibility. Though I suppose I can check that myself, too. 🙂

Thanks for reopening the issue! I'm glad to have some conversation around this, whatever the final decision is.

ltratt commented 1 year ago

Action code is never copyable-and-pastable between languages, but people do often want to copy-and-paste the "bare" grammar without action code. [grmtools even has a "generic parse tree" mode, including in the nimbleparse tool, to make it easy to test grammars that don't have action code.]

I'd be interested to know what both Bison and BSD Yacc do with duplicate rules!

benblank commented 1 year ago

Yeah, both Bison 3.8.2 (even with -Wall and/or --yacc) and Berkeley Yacc 1.9 20200330 were able to consume a grammar with duplicate rules without comment or complaint.

The grammar I used:

%token A
%token B
%token C
%start Expr

%%

Expr: A;
Expr: B;
Expr: C;
ltratt commented 1 year ago

Interesting! We should probably support this then (CCing @ratmice in case he has thoughts).

ratmice commented 1 year ago

I know a long time ago i've seen the case where duplicate entries are used (but can't for the life of me remember where), I personally have somewhat mixed feelings about it, as someone who tends to leave commented out copies of rules in various states of disarray as I work on grammars, so I wouldn't really use that style but it would mean the loss of DuplicateRule which has caught various errors.

I don't think it is something we can easily check at the GrammarAST level afterwords (The current GrammarAST for Expr: A; Expr: B; seems like it should be the same as the AST for the Expr: A | B case, so if we want to keep the DuplicateRule it would seem like it requires some flag to be checked either by directive, or less likely the Builder I think the existing builder options do their checking late without passing flags in during construction other than the GrammarKind -- where they would be needed. Not really a huge fan of excessive knob turning, so I don't know if trying to keep both available is actually worthwhile or if we should just drop DuplicateRule all together?

I note in https://www.gnu.org/software/bison/manual/html_node/Operation-Modes.html the documentation for --update states: remove duplicates I wonder if that transforms the Expr: A; Expr: B; syntax into the Expr: A | B syntax, so it might be bison considers it a legacy behavior.

ltratt commented 1 year ago

I must admit that DuplicateRule has caught errors in my stuff too...

benblank commented 1 year ago

I note in gnu.org/software/bison/manual/html_node/Operation-Modes.html the documentation for --update states: remove duplicates I wonder if that transforms the Expr: A; Expr: B; syntax into the Expr: A | B syntax, so it might be bison considers it a legacy behavior.

I'm not sure what the "duplicates" that flag targets are, but they don't appear to be the kind we're talking about here; running bison --update test.y on then grammar I posted above neither produces any warnings (or output of any kind) nor changes the file:

$ ls
total 4
-rw-r--r-- 1 five35 five35 71 Jan 17 11:12 test.y
$ cat test.y
%token A
%token B
%token C
%start Expr

%%

Expr: A;
Expr: B;
Expr: C;
$ bison --update test.y
$ ls
total 4
-rw-r--r-- 1 five35 five35 71 Jan 17 11:12 test.y
$ cat test.y
%token A
%token B
%token C
%start Expr

%%

Expr: A;
Expr: B;
Expr: C;
ltratt commented 1 year ago

%empty is now supported in master after https://github.com/softdevteam/grmtools/pull/389.

I'm not sure I'm going to get around to adding support for the "repeat a rule" thing myself, but I'd happily take a PR to add support. It does mean we have to forego the "repeated rule" warning but this is technically backwards compatible, so I can live with it myself.

ltratt commented 1 year ago

And now @ratmice has fixed rule repetition in https://github.com/ratmice/grmtools/commit/5785e299ca9af66a0210a938bc1e964008c07110 -- thanks!

I think this long-open issue is now resolved in as good a way as we can do given our desire to be mostly compatible with Yacc!

benblank commented 1 year ago

Yep. I do still think leading pipes would be a nice-to-have, but compatibility is more important. And having added support for both %empty and repeated rule declarations seems like a great place to leave the issue.

Thanks for all your hard work, both of you!