kjosib / booze-tools

Booze Tools will become the complete programming-language development workbench, all written in Python 3.9 (for now).
MIT License
14 stars 1 forks source link

Does booze-tools eliminate this LR(1) redundancy? #45

Closed modulovalue closed 1 year ago

modulovalue commented 1 year ago

Hey @kjosib!

(modulovalue from Reddit here)

I wanted to ask you if booze-tools is able to remove the following redundancy:

(cf. https://github.com/osa1/parsegen/issues/11, https://github.com/mdaines/grammophone/issues/26#issuecomment-1597730431)

Consider the following grammar:

S -> a X a | b X b | a Y b | b Y a.
X -> c "X'".
Y -> c "Y'".
"X'" -> c.
"Y'" -> c.

This grammar is LR(1), but not LALR(1).

Here's the LR(1) automaton from grammophone:

Bildschirm­foto 2023-07-31 um 00 17 07

I think, state 13 & 18 and state 17 & 12 could be collapsed without introducing any inadequate states.

I wanted to ask you whether your LR(1) construction is able to avoid introducing these duplicate states without having to merge them or are these "redundancies" also present in your approach?

Edit: here's the LR(0) automaton, just to be complete:

Bildschirm­foto 2023-07-31 um 00 22 52
kjosib commented 1 year ago

Oh snap! Minimal-mode is evidently bugged right now. Thank you! (Now let me add a unit test...)

Canonical-LR mode comes up with the same 20 states as everyone else, as you can see below.

For some reason in this case minimal-mode just returns the LALR table and complains about R/R conflicts. I'll have to dig into why. It'll be fun! (I may have inadvertently broken it while adding some other feature, but existing tests passed.) It's been a long long time since I've messed with this, so it may take some time to page it in from swap.

To your point about merging states after generation: I agree that in theory you could merge them, but in practice I'd probably rely on a table-compressor. If you enter a state via a nonterminal transition because of lookahead then you can safely treat empty cells on that row as "don't care" and merge with any random non-conflicting row, but if you have LR(0)-adequate states that reduce eagerly then you need the GOTO-state to be responsible for error-detection in the subsequent terminal. It might be possible work out precisely which states are eligible, but it's not an approach I've taken just yet.

Productions: S

S -> a X a | b X b | a Y b | b Y a
X -> c XP
Y -> c YP
XP -> c
YP -> c

Precedence

%method CLR 

Gives us

D:\Playground\lr1>py -m boozetools --pretty foo.md
Action and Goto: (20 states)
───┬──────┬───────┬────┬────┬────┬──┬───┬───┬────┬───┬───
   │      │ <END> │  a │  b │  c │  │ S │ X │ XP │ Y │ YP
───┼──────┼───────┼────┼────┼────┼──┼───┼───┼────┼───┼───
 0 │ None │     0 │  2 │  3 │  0 │  │ 1 │ 0 │  0 │ 0 │  0
 1 │    S │     1 │  0 │  0 │  0 │  │ 0 │ 0 │  0 │ 0 │  0
 2 │    a │     0 │  0 │  0 │  6 │  │ 0 │ 4 │  0 │ 5 │  0
 3 │    b │     0 │  0 │  0 │  9 │  │ 0 │ 7 │  0 │ 8 │  0
 4 │    X │     0 │ 10 │  0 │  0 │  │ 0 │ 0 │  0 │ 0 │  0
───┼──────┼───────┼────┼────┼────┼──┼───┼───┼────┼───┼───
 5 │    Y │     0 │  0 │ 11 │  0 │  │ 0 │ 0 │  0 │ 0 │  0
 6 │    c │     0 │  0 │  0 │ 14 │  │ 0 │ 0 │ 12 │ 0 │ 13
 7 │    X │     0 │  0 │ 15 │  0 │  │ 0 │ 0 │  0 │ 0 │  0
 8 │    Y │     0 │ 16 │  0 │  0 │  │ 0 │ 0 │  0 │ 0 │  0
 9 │    c │     0 │  0 │  0 │ 19 │  │ 0 │ 0 │ 17 │ 0 │ 18
───┼──────┼───────┼────┼────┼────┼──┼───┼───┼────┼───┼───
10 │    a │    -1 │  0 │  0 │  0 │  │ 0 │ 0 │  0 │ 0 │  0
11 │    b │    -3 │  0 │  0 │  0 │  │ 0 │ 0 │  0 │ 0 │  0
12 │   XP │     0 │ -5 │  0 │  0 │  │ 0 │ 0 │  0 │ 0 │  0
13 │   YP │     0 │  0 │ -6 │  0 │  │ 0 │ 0 │  0 │ 0 │  0
14 │    c │     0 │ -7 │ -8 │  0 │  │ 0 │ 0 │  0 │ 0 │  0
───┼──────┼───────┼────┼────┼────┼──┼───┼───┼────┼───┼───
15 │    b │    -2 │  0 │  0 │  0 │  │ 0 │ 0 │  0 │ 0 │  0
16 │    a │    -4 │  0 │  0 │  0 │  │ 0 │ 0 │  0 │ 0 │  0
17 │   XP │     0 │  0 │ -5 │  0 │  │ 0 │ 0 │  0 │ 0 │  0
18 │   YP │     0 │ -6 │  0 │  0 │  │ 0 │ 0 │  0 │ 0 │  0
19 │    c │     0 │ -8 │ -7 │  0 │  │ 0 │ 0 │  0 │ 0 │  0
───┴──────┴───────┴────┴────┴────┴──┴───┴───┴────┴───┴───
modulovalue commented 1 year ago

Thank you for looking into this.

I will come back to LR, and will give your approach a closer look, once I'm hopefully eventually done with LALR(k).

kjosib commented 1 year ago

Current code now gives the 18-state table below. I'm going to say the answer is "yes" for redundancies sufficiently similar to your example, but there's still room to generalize on the kinds of redundancies it will eliminate. I have some ideas on that which are currently explained near the bottom of https://boozetools.readthedocs.io/en/latest/minimal.html

As such, I'll consider the correctness bug fixed and deal with the further minimization later. (In fact it would be ideal if LALR grammars produced LALR-sized tables; they currently often don't, but the resulting error-detection properties might differ.)

D:\GitHub\booze-tools\example\non-lalr>py -m boozetools --pretty issue-45.md --force --method=LR1
Action and Goto: 18 states
───┬──────┬───────┬────┬────┬────┬──┬───┬───┬────┬───┬───
   │      │ <END> │  a │  b │  c │  │ S │ X │ XP │ Y │ YP
───┼──────┼───────┼────┼────┼────┼──┼───┼───┼────┼───┼───
 0 │ None │       │  2 │  3 │    │  │ 1 │   │    │   │
 1 │    S │     1 │    │    │    │  │   │   │    │   │
 2 │    a │       │    │    │  6 │  │   │ 4 │    │ 5 │
 3 │    b │       │    │    │  9 │  │   │ 7 │    │ 8 │
 4 │    X │       │ 10 │    │    │  │   │   │    │   │
───┼──────┼───────┼────┼────┼────┼──┼───┼───┼────┼───┼───
 5 │    Y │       │    │ 11 │    │  │   │   │    │   │
 6 │    c │       │    │    │ 14 │  │   │   │ 12 │   │ 13
 7 │    X │       │    │ 15 │    │  │   │   │    │   │
 8 │    Y │       │ 16 │    │    │  │   │   │    │   │
 9 │    c │       │    │    │ 17 │  │   │   │ 12 │   │ 13
───┼──────┼───────┼────┼────┼────┼──┼───┼───┼────┼───┼───
10 │    a │    -1 │ -1 │ -1 │ -1 │  │   │   │    │   │
11 │    b │    -3 │ -3 │ -3 │ -3 │  │   │   │    │   │
12 │   XP │    -5 │ -5 │ -5 │ -5 │  │   │   │    │   │
13 │   YP │    -6 │ -6 │ -6 │ -6 │  │   │   │    │   │
14 │    c │       │ -7 │ -8 │    │  │   │   │    │   │
───┼──────┼───────┼────┼────┼────┼──┼───┼───┼────┼───┼───
15 │    b │    -2 │ -2 │ -2 │ -2 │  │   │   │    │   │
16 │    a │    -4 │ -4 │ -4 │ -4 │  │   │   │    │   │
17 │    c │       │ -8 │ -7 │    │  │   │   │    │   │
───┴──────┴───────┴────┴────┴────┴──┴───┴───┴────┴───┴───