jeffreykegler / Marpa--R2

Parse any language you can describe in BNF
https://jeffreykegler.github.io/Marpa-web-site/
Other
157 stars 23 forks source link

Describe reconstruction of parses in the presence of the Leo optimization #288

Closed dabrahams closed 1 year ago

dabrahams commented 1 year ago

@jeffreykegler mentioned that his description of how to analyze Leo chains might be too daunting. Here's what I'm planning to do:

Definition: A derivation path is a sequence, in Earley set order, of predecessor Earley items of a completion, not including the completion, even if that completion was optimized out via Leo.

Premise: In Earley set k we have a completion X = [A→𝜶B., i], where the origin of B is known to be j, but can find no completion for a rule B→𝜷 with origin j in the same Earley set.

This completion must have been optimized away by Leo. To find all completions missing from set k that would otherwise contribute to X:

  1. In set k, find a completion [C→𝜹., m] and a matching Leo item in set m of the form [A→𝜶B., C, i]
  2. The unique item Y in set m of the form [D→𝜸.C, h] is the source of the Leo item found.
  3. it is also the last element of all derivation paths participating in X that cover the range [h, k)
  4. The corresponding completion missing from set k is [D→𝜸C., h]
  5. The derivation paths for any parent of Y end with the unique item in set h of the form [E→𝝃.D, g]
  6. go to step 2, substituting [E→𝝃.D, g] for [D→𝜸.C, h] and h for m, until you reach the tail of a derivation path for X
dabrahams commented 1 year ago

Actually I don't think you need the whole concept of derivation paths for this; if it won't disorient you too much I might just take it out.

jeffreykegler commented 1 year ago

Please leave the material on derivation paths in for the time being.

Sent from ProtonMail mobile

-------- Original Message -------- On Dec 3, 2022, 5:15 PM, Dave Abrahams wrote:

Actually I don't think you need the whole concept of derivation paths for this; if it won't disorient you too much I might just take it out.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

jeffreykegler commented 1 year ago

I'll need a while to read and think this over.

dabrahams commented 1 year ago

OK. I have an example I've worked through if you think that would be helpful.

jeffreykegler commented 1 year ago

Can't hurt. I'd be happy to see it. It may be easier to explain in terms of an example rather than math notation.

dabrahams commented 1 year ago

Grammar:

A ::= 'w' 'x' B | 'w'
B ::= C
C ::= 'y' 'z' A

where A is the start symbol. Input is "wxyzwxyzw"

The chart looks like this. The numbers and square brackets indicate origins and the earleme of the dot. Curly brackets surround a set of pre-dot origins, which I use to reconstruct derivation paths. Asterixes mark Leo items. The final column is a code I invented that describes how each item arose during recognition; you can ignore that or if you want I can explain it.

---------- 0 ----------
0: [0 A ::= [0 •'w' 'x' B                    P
1: [0 A ::= [0 •'w'                          P

---------- 1 ---------- 'w'
2: [0 A         ::= [0  'w' 1]  •'x' B       <0['w']
3: [0 A 1]  ::= [0  'w' 1]  •

---------- 2 ---------- 'x'
4: [0 A 2]  ::= [0 'w' 'x' B 2] •B*  L5
5: [0 A         ::= [0 'w' {1} 'x' 2]   •B   <2['x']
6: [0 A 2]  ::= [0 'w' 'x' B 2] •C*  L7p4
7: [2 B         ::= [2 •C                    P5
8: [2 C         ::= [2 •'y' 'z' A            P7

---------- 3 ---------- 'y'
9: [2 C ::= [2  'y' 3]  •'z' A               <8['y']

---------- 4 ---------- 'z'
10: [0 A 4] ::= [0 'w' 'x' B 4] •A*  L11p6
11: [2 C    ::= [2 'y' {3} 'z' 4]   •A   <9['z']
12: [4 A    ::= [4 •'w' 'x' B            P11
13: [4 A    ::= [4 •'w'                  P11

---------- 5 ---------- 'w'
14: [4 A    ::= [4  'w' 5]  •'x' B       <12['w']
15: [0 A 5] ::= [0 'w' 'x' {2} B 5] •    <10[16]
16: [4 A 5] ::= [4  'w' 5]  •            <13['w']

---------- 6 ---------- 'x'
17: [0 A 6] ::= [0 'w' 'x' B 6] •B*  L18p10
18: [4 A    ::= [4 'w' {5} 'x' 6]   •B   <14['x']
19: [0 A 6] ::= [0 'w' 'x' B 6] •C*  L20p17
20: [6 B    ::= [6 •C                    P18
21: [6 C    ::= [6 •'y' 'z' A            P20

---------- 7 ---------- 'y'
22: [6 C    ::= [6  'y' 7]  •'z' A       <21['y']

---------- 8 ---------- 'z'
23: [0 A 8] ::= [0 'w' 'x' B 8] •A*  L24p19
24: [6 C    ::= [6 'y' {7} 'z' 8]   •A   <22['z']
25: [8 A    ::= [8 •'w' 'x' B            P24
26: [8 A    ::= [8 •'w'                  P24

---------- 9 ---------- 'w'
27: [8 A    ::= [8  'w' 9]  •'x' B       <25['w']
28: [0 A 9] ::= [0 'w' 'x' {2} B 9] •    <23[29]
29: [8 A 9] ::= [8  'w' 9]  •            <26['w']

I also represented the chart another way, showing only the items relevant to a complete recognition. The lines with rules in them should be pretty self-explanatory. The numbers below each of those lines identify the items in the chart above that are relevant to recognizing the LHS of each rule line, i.e. they are the rule's derivation path (and possibly completion).

Set:  0     1     2     3     4     5     6     7     8     9
A ::=   'w'   'x'    B....................................
      0     2     5                                        28
B ::=                C....................................
                  7
C ::=               'y'   'z'    A........................
                  8     9    11
A ::=                           'w'   'x'    B............
                             15     14   18
B ::=                                        C............
                                         20
C ::=                                       'y'   'z'   A
                                         21     22   24
A ::=                                                  'w'
                                                     26    29

As your book describes, only the top and bottom of the parse have completions (items 28 and 29).

When I'm analyzing the top line of the parse, I'll fail to find a completion for the B. I don't really need the completion items, but I need to know the rules involved and the sequence of numbers below each one.

Because there is no completion of a B with origin 2 in Earley set 9, I know item 28, which completes the first line, is the product of a Leo item. So let's analyze that:

You can see how this process continues upwards until it arrives back at the top line, discovering all the derivation paths.

jeffreykegler commented 1 year ago

I was able to figure out the code for the causation of the items, but if you could document it for anyone visiting or revisiting the thread, I think that would be helpful. Thanks!

jeffreykegler commented 1 year ago

You assume in the example there is no completion for B with origin 2 iff there is a Leo item with transition symbol B in Earley set. In the presence of ambiguity, I don't think this is, in general, true.

Imagine your grammar had an additional rule:

    C ::= 'y' 'z' 'w' 'x' 'y' 'z' 'w'

Then I think there would be a completion for B in Earley set 9 with origin 2, as well as the Leo item. Both would be part of parses of the input string.

I am gradually getting back into the logic, and I hope I got this right.

dabrahams commented 1 year ago

Here's the chart I get with that additional rule in place (no derivation codes, this time; I don't generate them programmatically). I don't see the completion you're referring to in there. I believe that's because Leo item 6 (it's the same in both charts) preempts the completion that would normally have been generated by combining item 7 with the completion of that C (39 in the new chart).

---------- 0 ----------
0: [0 A ::= [0 •'w' 'x' B
1: [0 A ::= [0 •'w'
---------- 1 ----------
2: [0 A ::= [0  'w' 1]  •'x' B
3: [0 A 1]  ::= [0  'w' 1]  •
---------- 2 ----------
4: [0 A]    ::= [0 'w' 'x' {2} B•B*
5: [0 A ::= [0 'w' {1} 'x' 2]   •B
6: [0 A]    ::= [0 'w' 'x' {2} B•C*
7: [2 B ::= [2 •C
8: [2 C ::= [2 •'y' 'z' A
9: [2 C ::= [2 •'y' 'z' 'w' 'x' 'y' 'z' 'w'
---------- 3 ----------
10: [2 C    ::= [2  'y' 3]  •'z' A
11: [2 C    ::= [2  'y' 3]  •'z' 'w' 'x' 'y' 'z' 'w'
---------- 4 ----------
12: [0 A]   ::= [0 'w' 'x' {2} B•A*
13: [2 C    ::= [2 'y' {3} 'z' 4]   •A
14: [2 C    ::= [2 'y' {3} 'z' 4]   •'w' 'x' 'y' 'z' 'w'
15: [4 A    ::= [4 •'w' 'x' B
16: [4 A    ::= [4 •'w'
---------- 5 ----------
17: [2 C    ::= [2 'y' 'z' {4} 'w' 5]   •'x' 'y' 'z' 'w'
18: [4 A    ::= [4  'w' 5]  •'x' B
19: [0 A 5] ::= [0 'w' 'x' {4} B 5] •
20: [4 A 5] ::= [4  'w' 5]  •
---------- 6 ----------
21: [0 A]   ::= [0 'w' 'x' {2} B•B*
22: [4 A    ::= [4 'w' {5} 'x' 6]   •B
23: [0 A]   ::= [0 'w' 'x' {2} B•C*
24: [6 B    ::= [6 •C
25: [2 C    ::= [2 'y' 'z' 'w' {5} 'x' 6]   •'y' 'z' 'w'
26: [6 C    ::= [6 •'y' 'z' A
27: [6 C    ::= [6 •'y' 'z' 'w' 'x' 'y' 'z' 'w'
---------- 7 ----------
28: [2 C    ::= [2 'y' 'z' 'w' 'x' {6} 'y' 7]   •'z' 'w'
29: [6 C    ::= [6  'y' 7]  •'z' A
30: [6 C    ::= [6  'y' 7]  •'z' 'w' 'x' 'y' 'z' 'w'
---------- 8 ----------
31: [0 A]   ::= [0 'w' 'x' {2} B•A*
32: [6 C    ::= [6 'y' {7} 'z' 8]   •A
33: [2 C    ::= [2 'y' 'z' 'w' 'x' 'y' {7} 'z' 8]   •'w'
34: [6 C    ::= [6 'y' {7} 'z' 8]   •'w' 'x' 'y' 'z' 'w'
35: [8 A    ::= [8 •'w' 'x' B
36: [8 A    ::= [8 •'w'
---------- 9 ----------
37: [6 C    ::= [6 'y' 'z' {8} 'w' 9]   •'x' 'y' 'z' 'w'
38: [8 A    ::= [8  'w' 9]  •'x' B
39: [2 C 9] ::= [2 'y' 'z' 'w' 'x' 'y' 'z' {8} 'w' 9]   •
40: [0 A 9] ::= [0 'w' 'x' {2 8} B 9]   •
42: [8 A 9] ::= [8  'w' 9]  •
---------- 10 ----------

[EDIT 1] For completeness I added the other diagram for this case: [EDIT 2] I used to be throwing out useful predot origin information for items generated by Leo, which would make deriving the missing completions harder, so I've added that information. Now as you can see item 40 has 2 predot origins, the first corresponding to combining Leo6 with completion 39, and the 2nd by combining Leo31 with completion 42.

Set:  0     1     2     3     4     5     6     7     8     9
A ::=   'w'   'x'    B....................................
      0     2     5                                        40
B ::=                C....................................
                  7
C ::=               'y'   'z'    A........................
                  8     10    13
C ::=               'y'   'z'   'w'   'x'   'y'   'z'   'w'
                  9     11    14    17   25    28    33    39
A ::=                           'w'   'x'    B............
                              15    18   22
B ::=                                        C............
                                         24
C ::=                                       'y'   'z'   A
                                         26    29    32
A ::=                                                  'w'
                                                     36    42
jeffreykegler commented 1 year ago

I think your implementation might miss this parse:

   A -> 'w' 'x' B
       -> 'w' 'x' C
       -> 'w' 'x' 'y' 'z' 'w' 'x' 'y' 'z' 'w'

That is, it's not only pre-empting the Leo completions, it's pre-empting others and losing parses.

dabrahams commented 1 year ago

I don't think it's losing that parse. The same procedure that finds the one parse in the original example finds both of the parses in your updated example. We explore both of the completion/Leo matchups in Earley set 9. In the new chart, we have completion 40 matching both the Earley41 / Leo31 pair (the old parse) and the Earley39/Leo6 pair. The source of Leo6 is item 7, which is the tail of the (one-element) derivation path for the missing B completion.

jeffreykegler commented 1 year ago

I'll run these 2 grammars under Marpa::R2 and compare. It may be the 2 algorithms just work differently.

dabrahams commented 1 year ago

Look forward to seeing the results. Did my explanation just above make sense to you?

jeffreykegler commented 1 year ago

It may, once I compare the Earley sets from the 2 implementations.

Sent from ProtonMail mobile

-------- Original Message -------- On Dec 4, 2022, 5:40 PM, Dave Abrahams wrote:

Look forward to seeing the results. Did my explanation just above make sense to you?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were assigned.Message ID: @.***>

jeffreykegler commented 1 year ago
=== Progress @0 ===
P0 @0-0 L0c0 A -> . 'w' 'x' B
P1 @0-0 L0c0 A -> . 'w'
P5 @0-0 L0c0 :start -> . A
=== Progress @1 ===
R0:1 @0-1 L1c1 A -> 'w' . 'x' B
F1 @0-1 L1c1 A -> 'w' .
F5 @0-1 L1c1 :start -> A .
=== Progress @2 ===
R0:2 @0-2 L1c1-3 A -> 'w' 'x' . B
P2 @2-2 L1c3 B -> . C
P3 @2-2 L1c3 C -> . 'y' 'z' A
P4 @2-2 L1c3 C -> . 'y' 'z' 'w' 'x' 'y' 'z' 'w'
Leo item for transition on "B"
Leo item for transition on "C"
=== Progress @3 ===
R3:1 @2-3 L1c3-5 C -> 'y' . 'z' A
R4:1 @2-3 L1c3-5 C -> 'y' . 'z' 'w' 'x' 'y' 'z' 'w'
=== Progress @4 ===
P0 @4-4 L1c7 A -> . 'w' 'x' B
P1 @4-4 L1c7 A -> . 'w'
R3:2 @2-4 L1c3-7 C -> 'y' 'z' . A
R4:2 @2-4 L1c3-7 C -> 'y' 'z' . 'w' 'x' 'y' 'z' 'w'
Leo item for transition on "A"
=== Progress @5 ===
R0:1 @4-5 L1c7-9 A -> 'w' . 'x' B
F0 @0-5 L1c1-9 A -> 'w' 'x' B .
F1 @4-5 L1c7-9 A -> 'w' .
F2 @2-5 L1c3-9 B -> C .
F3 @2-5 L1c3-9 C -> 'y' 'z' A .
R4:3 @2-5 L1c3-9 C -> 'y' 'z' 'w' . 'x' 'y' 'z' 'w'
F5 @0-5 L1c1-9 :start -> A .
=== Progress @6 ===
R0:2 @4-6 L1c7-11 A -> 'w' 'x' . B
P2 @6-6 L1c11 B -> . C
P3 @6-6 L1c11 C -> . 'y' 'z' A
P4 @6-6 L1c11 C -> . 'y' 'z' 'w' 'x' 'y' 'z' 'w'
R4:4 @2-6 L1c3-11 C -> 'y' 'z' 'w' 'x' . 'y' 'z' 'w'
Leo item for transition on "B"
Leo item for transition on "C"
=== Progress @7 ===
R3:1 @6-7 L1c11-13 C -> 'y' . 'z' A
R4:1 @6-7 L1c11-13 C -> 'y' . 'z' 'w' 'x' 'y' 'z' 'w'
R4:5 @2-7 L1c3-13 C -> 'y' 'z' 'w' 'x' 'y' . 'z' 'w'
=== Progress @8 ===
P0 @8-8 L1c15 A -> . 'w' 'x' B
P1 @8-8 L1c15 A -> . 'w'
R3:2 @6-8 L1c11-15 C -> 'y' 'z' . A
R4:2 @6-8 L1c11-15 C -> 'y' 'z' . 'w' 'x' 'y' 'z' 'w'
R4:6 @2-8 L1c3-15 C -> 'y' 'z' 'w' 'x' 'y' 'z' . 'w'
Leo item for transition on "A"
=== Progress @9 ===
R0:1 @8-9 L1c15-17 A -> 'w' . 'x' B
F0 x2 @0,4-9 L1c1-17 A -> 'w' 'x' B .
F1 @8-9 L1c15-17 A -> 'w' .
F2 x2 @2,6-9 L1c3-17 B -> C .
F3 x2 @2,6-9 L1c3-17 C -> 'y' 'z' A .
R4:3 @6-9 L1c11-17 C -> 'y' 'z' 'w' . 'x' 'y' 'z' 'w'
F4 @2-9 L1c3-17 C -> 'y' 'z' 'w' 'x' 'y' 'z' 'w' .
F5 @0-9 L1c1-17 :start -> A .
jeffreykegler commented 1 year ago

Above is the output for the extended grammar from Marpa::R2. The items with the augment symbol <:start> can be ignored. Looks the same except for these two lines in Earley set 9:

F2 x2 @2,6-9 L1c3-17 B -> C .
F3 x2 @2,6-9 L1c3-17 C -> 'y' 'z' A .

Note the x2 in the above --- these 2 lines represent 4 Earley items.

jeffreykegler commented 1 year ago

Btw, an augment symbol is something you will probably want. It prevents the start symbol from being recursive -- this causes problems, which is why use of an augment symbol is almost universal.

jeffreykegler commented 1 year ago

Going back to comment https://github.com/jeffreykegler/Marpa--R2/issues/288, I didn't follow it at all. In any case, the two items referenced match items in my R2 output. Your completion 40, 40: [0 A 9] ::= [0 'w' 'x' {2} B 9] •, is my F0 x2 @0,4-9 L1c1-17 A -> 'w' 'x' B . And your item 7, 7: [2 B ::= [2 •C is my P2 @2-2 L1c3 B -> . C.

dabrahams commented 1 year ago

Looks the same except for these two lines in Earley set 9

A couple of questions:

  1. Where did those four items come from?
    F2 x2 @2,6-9 L1c3-17 B -> C .
    F3 x2 @2,6-9 L1c3-17 C -> 'y' 'z' A .

    If I understand your notation, and the algorithm, correctly, the first of these could only have been generated by two things:

    1. A Leo item that generates a completion of B->C•,
    2. the discovery of a C covering 6-9, combined with the prediction B->•C in set 6 I can't see the detail of your Leo items as far as I know, but you said my chart substantially matches yours and I have no such Leo item. Furthermore, my Leo item 23 prevents the discovery of a C starting in 6 from generating a completion of B->C•. So I'm wondering what could have caused that completion to show up. A similar logic applies to the second line.
  2. If each of those lines represents two items, in what way are the two items different? To be distinct elements of a set, they have to be different somehow. Is this multiplicity perhaps how you represent that the item was discovered via more than one predecessor or other predot symbol derivation?

Going back to comment https://github.com/jeffreykegler/Marpa--R2/issues/288, I didn't follow it at all

Okay, I can try to write out a complete pseudocode for finding all the missing completions, or you can ask a question whose answer will help you understand, or we can proceed some other way that you suggest. Please let me know what you want to do.

Regarding the augment symbol, I figured I'd add one at such a time I found it to be necessary. So far, I haven't. If you have an example that demonstrates why it's needed, I'd love to see it.

jeffreykegler commented 1 year ago

The format of my progress lines is documented here. In the two lines in question the x2 @2,6-9 means the lines represents two Earley items, both ending at earleme 9. One starts at earleme 2 and the other at earleme 6. The two Earley items F2 x2 @2,6-9 L1c3-17 B -> C . have the two Earley items F3 x2 @2,6-9 L1c3-17 C -> 'y' 'z' A . as their tributaries. Of the two Earley items F3 x2 @2,6-9 L1c3-17 C -> 'y' 'z' A, the one starting at 2 has one of the Earley items in F0 x2 @0,4-9 L1c1-17 A -> 'w' 'x' B . as its tributary, and the one starting at 6 has the Earley item F1 @8-9 L1c15-17 A -> 'w' . as its tributary.

By the way, use has convinced me that both my terminology, and the traditional, for the causation of Leo and Earley items is sub-optimal, and I'm thinking of switching to fluvial terms. These are not overloaded, are fairly intuitive and are part of a rich vocabulary. When two rivers meet one is the mainstem and the other is a tributary. I am thinking of calling predecessors "mainstems", and what I had variously been calling components or causes, "tributaries".

jeffreykegler commented 1 year ago

Leo's paper, and his results, assume that the grammar is augmented. As for an actual example of why, if I run across one in literature I'll pass it on. Though if you don't add an augment symbol, you might be the first to generate an example showing why one is needed. :-)

jeffreykegler commented 1 year ago

Wrt comment https://github.com/jeffreykegler/Marpa--R2/issues/288, I'm inclined to ignore the issue. My implementation generates both the Earley items in question, along with all the same tributaries and mainstems as your implementation does, so it is not clear that there's an actual issue at stake in https://github.com/jeffreykegler/Marpa--R2/issues/288.

OTOH, pseudo-code of your approach to finding missing completions might be helpful and I'd be glad to read it.

jeffreykegler commented 1 year ago

Currently absent from Marpa::R2 is a high-level trace output for Leo items. There are diagnostics, but they use the internal symbols and rules. My original thought was that Leo items, unlike Earley items, were an internal issue, and an internals-ish trace sufficed. But experience, as exemplified by this discussion, shows that not to be the case.

Since I have an example to hand, this is an excellent opportunity to add a more application-developer trace facility for Leo items to Marpa::R2, and that is underway.

dabrahams commented 1 year ago

The two Earley items F2 x2 @2,6-9 L1c3-17 B -> C . have the two Earley items F3 x2 @2,6-9 L1c3-17 C -> 'y' 'z' A . as their tributaries. Of the two Earley items F3 x2 @2,6-9 L1c3-17 C -> 'y' 'z' A , the one starting at 2 has one of the Earley items in F0 x2 @0,4-9 L1c1-17 A -> 'w' 'x' B . as its tributary, and the one starting at 6 has the Earley item F1 @8-9 L1c15-17 A -> 'w' . as its tributary.

OK I understand this, but the question behind my question remains: It seems to me that the the existence of Leo items with transition symbols A and B that accompany all Items of the form C-> 'y' 'z' •A and A -> 'w' 'x'•B respectively must prevent the completions of the C and A you cite from ever being generated. Why is that not the case? (Or, are these completions only reconstructed later as part of the very process we're discussing now?)

dabrahams commented 1 year ago

By the way, use has convinced me that both my terminology, and the traditional, for the causation of Leo and Earley items is sub-optimal, and I'm thinking of switching to fluvial terms. These are not overloaded, are fairly intuitive and are part of a rich vocabulary. When two rivers meet one is the mainstem and the other is a tributary. I am thinking of calling predecessors "mainstems", and what I had variously been calling components or causes, "tributaries".

🤷 I can work with either one. It all depends what you're shooting for: I think these terms (especially "mainstem") are pretty esoteric and will not be immediately obvious to many people. Either way, in all of your writing that I've read I don't think I ever came across the word component nor a definition of the word predecessor, so IMO a prominent definition of terms is more important than which words you choose.

jeffreykegler commented 1 year ago

Yes, the presence of the Leo items does prevent the addition of completions during recognition. At evaluation time, however, those which are accessible from the accept (aka top or start) Earley item must be present, in order to have a complete tree to traverse. There's a lot of reasons you want a complete tree -- one important one is that the missing nodes of the tree might have semantics attached.

Libmarpa adds the missing-but-accessible completions back in when it creates the bocage. And Marpa::R2's progress listings contain them because otherwise tracing the parse becomes very difficult.

Sent with Proton Mail secure email.

------- Original Message ------- On Monday, December 5th, 2022 at 3:30 PM, Dave Abrahams @.***> wrote:

The two Earley items F2 x2 @.(https://github.com/2),6-9 L1c3-17 B -> C . have the two Earley items F3 x2 @.(https://github.com/2),6-9 L1c3-17 C -> 'y' 'z' A . as their tributaries. Of the two Earley items F3 x2 @.(https://github.com/2),6-9 L1c3-17 C -> 'y' 'z' A , the one starting at 2 has one of the Earley items in F0 x2 @.(https://github.com/0),4-9 L1c1-17 A -> 'w' 'x' B . as its tributary, and the one starting at 6 has the Earley item F1 @.***(https://github.com/8-9) L1c15-17 A -> 'w' . as its tributary.

OK I understand this, but the question behind my question remains: It seems to me that the the existence of Leo items with transition symbols A and B that accompany all Items of the form C-> 'y' 'z' •A and A -> 'w' 'x'•B respectively must prevent the completions of the C and A you cite from ever being generated. Why is that not the case?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were assigned.Message ID: @.***>

jeffreykegler commented 1 year ago

Predecessor and component (aka "cause") are defined for each operation in sections 5 and 6 of the arxiv.org paper.

dabrahams commented 1 year ago

At evaluation time, however, those which are accessible from the accept (aka top or start) Earley item must be present, in order to have a complete tree to traverse. There's a lot of reasons you want a complete tree -- one important one is that the missing nodes of the tree might have semantics attached.

I agree that you want a complete tree, but I disagree that I need to create those completions directly. I just need to store the location of the tail element of the derivation path for each completion, which amounts to much of the same information but can be generated on the fly as I get to the relevant part of a tree/forest. That way I don't have to do an additional pass over the whole tree to create them, and can just store the derivation tails relevant to what the user explores in the chart. It's not a big deal but I think it will be a bit more efficient in common cases.

jeffreykegler commented 1 year ago

I'm happy to see new approaches tried.

Sent from ProtonMail mobile

-------- Original Message -------- On Dec 5, 2022, 9:10 PM, Dave Abrahams wrote:

At evaluation time, however, those which are accessible from the accept (aka top or start) Earley item must be present, in order to have a complete tree to traverse. There's a lot of reasons you want a complete tree -- one important one is that the missing nodes of the tree might have semantics attached.

I agree that you want a complete tree, but I disagree that I need to create those completions directly. I just need to store the location of the tail element of the derivation path for each completion, which amounts to much of the same information but can be generated on the fly as I get to the relevant part of a tree/forest. That way I don't have to do an additional pass over the whole tree to create them, and can just store the derivation tails relevant to what the user explores in the chart. It's not a big deal but I think it will be a bit more efficient in common cases.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were assigned.Message ID: @.***>

jeffreykegler commented 1 year ago

@dabrahams I requested this issue to be opened in this repo, but it turns out it is not the best. I want to transfer it to the Marpa::R2 repo and fortunately Github has a COOOL new feature that does that sort of thing. If it's OK with you, I will transfer this issue.

Marpa::R2 is best, because this issue is motivating a lot of improvement in the Leo diagnostics in Marpa::R2.

dabrahams commented 1 year ago

be my guest!

jeffreykegler commented 1 year ago

The following file contains nothing new -- it is the same as in my comment above, except using the new Marpa::R2 diagnostic, so that there is more detail for the Leo items.

=== Progress @0 ===
P0 @0-0 L0c0 A -> . 'w' 'x' B
P1 @0-0 L0c0 A -> . 'w'
P5 @0-0 L0c0 :start -> . A
=== Progress @1 ===
R0:1 @0-1 L1c1 A -> 'w' . 'x' B
F1 @0-1 L1c1 A -> 'w' .
F5 @0-1 L1c1 :start -> A .
=== Progress @2 ===
R0:2 @0-2 L1c1-3 A -> 'w' 'x' . B
P2 @2-2 L1c3 B -> . C
P3 @2-2 L1c3 C -> . 'y' 'z' A
P4 @2-2 L1c3 C -> . 'y' 'z' 'w' 'x' 'y' 'z' 'w'
L2 [[A -> 'w' 'x' . B]; "B"; 0]
L2 [[A -> 'w' 'x' . B]; "C"; 2]
=== Progress @3 ===
R3:1 @2-3 L1c3-5 C -> 'y' . 'z' A
R4:1 @2-3 L1c3-5 C -> 'y' . 'z' 'w' 'x' 'y' 'z' 'w'
=== Progress @4 ===
P0 @4-4 L1c7 A -> . 'w' 'x' B
P1 @4-4 L1c7 A -> . 'w'
R3:2 @2-4 L1c3-7 C -> 'y' 'z' . A
R4:2 @2-4 L1c3-7 C -> 'y' 'z' . 'w' 'x' 'y' 'z' 'w'
L4 [[A -> 'w' 'x' . B]; "A"; 2]
=== Progress @5 ===
R0:1 @4-5 L1c7-9 A -> 'w' . 'x' B
F0 @0-5 L1c1-9 A -> 'w' 'x' B .
F1 @4-5 L1c7-9 A -> 'w' .
F2 @2-5 L1c3-9 B -> C .
F3 @2-5 L1c3-9 C -> 'y' 'z' A .
R4:3 @2-5 L1c3-9 C -> 'y' 'z' 'w' . 'x' 'y' 'z' 'w'
F5 @0-5 L1c1-9 :start -> A .
=== Progress @6 ===
R0:2 @4-6 L1c7-11 A -> 'w' 'x' . B
P2 @6-6 L1c11 B -> . C
P3 @6-6 L1c11 C -> . 'y' 'z' A
P4 @6-6 L1c11 C -> . 'y' 'z' 'w' 'x' 'y' 'z' 'w'
R4:4 @2-6 L1c3-11 C -> 'y' 'z' 'w' 'x' . 'y' 'z' 'w'
L6 [[A -> 'w' 'x' . B]; "B"; 4]
L6 [[A -> 'w' 'x' . B]; "C"; 6]
=== Progress @7 ===
R3:1 @6-7 L1c11-13 C -> 'y' . 'z' A
R4:1 @6-7 L1c11-13 C -> 'y' . 'z' 'w' 'x' 'y' 'z' 'w'
R4:5 @2-7 L1c3-13 C -> 'y' 'z' 'w' 'x' 'y' . 'z' 'w'
=== Progress @8 ===
P0 @8-8 L1c15 A -> . 'w' 'x' B
P1 @8-8 L1c15 A -> . 'w'
R3:2 @6-8 L1c11-15 C -> 'y' 'z' . A
R4:2 @6-8 L1c11-15 C -> 'y' 'z' . 'w' 'x' 'y' 'z' 'w'
R4:6 @2-8 L1c3-15 C -> 'y' 'z' 'w' 'x' 'y' 'z' . 'w'
L8 [[A -> 'w' 'x' . B]; "A"; 6]
=== Progress @9 ===
R0:1 @8-9 L1c15-17 A -> 'w' . 'x' B
F0 x2 @0,4-9 L1c1-17 A -> 'w' 'x' B .
F1 @8-9 L1c15-17 A -> 'w' .
F2 x2 @2,6-9 L1c3-17 B -> C .
F3 x2 @2,6-9 L1c3-17 C -> 'y' 'z' A .
R4:3 @6-9 L1c11-17 C -> 'y' 'z' 'w' . 'x' 'y' 'z' 'w'
F4 @2-9 L1c3-17 C -> 'y' 'z' 'w' 'x' 'y' 'z' 'w' .
F5 @0-9 L1c1-17 :start -> A .
jeffreykegler commented 1 year ago

@dabrahams, if I remember correctly, with the completion of my Marpa::R2 Leo diagnostics, we are done with this issue. Absent feedback to the contrary, I will close this issue in a couple of days.

dabrahams commented 1 year ago

Sure, this one should be closed, but don't you want another for the paper, with the original topic?

jeffreykegler commented 1 year ago

I made the changes in the paper repo already. The issue is https://github.com/jeffreykegler/Marpa-arxiv-paper/issues/11. Please report any problems with the changed version under that issue, or in a new issue in the arxiv paper repo.

For your convenience, I've uploaded a PDF of the revised paper: https://drive.google.com/file/d/1_2AmXs6PhEqDNe8gAeaolps6HKT1nrsq/view?usp=sharing . The revisions are mainly in sections 5 and 6. If you are interested in looking at it, please download it in the next couple of days, as I regularly clean up that Google Drive directory.

Per your comment, I will close this issue after a couple of days, unless something comes up.

jeffreykegler commented 1 year ago

Per https://github.com/jeffreykegler/Marpa--R2/issues/288#issuecomment-1345326259, I am closing this issue.

dabrahams commented 1 year ago

I believe I found a case for which my logic fails. Consider this similar grammar with the same input:

      A ::= 'w' 'x' B | 'w'
      B ::= 'y' 'z' 'w' 'x' 'y' 'z' 'w'
      B ::= 'y' 'z' A

for which I get the following chart:

---------- 0 ----------
0: [0 A ::= [0 •'w' 'x' B
1: [0 A ::= [0 •'w'
---------- 1 ----------
2: [0 A ::= [0  'w' 1]  •'x' B
3: [0 A 1]  ::= [0  'w' 1]  •
---------- 2 ----------
4: [0 A]    ::= [0 'w' 'x' {2} B•B*
5: [0 A ::= [0 'w' {1} 'x' 2]   •B
6: [2 B ::= [2 •'y' 'z' 'w' 'x' 'y' 'z' 'w'
7: [2 B ::= [2 •'y' 'z' A
---------- 3 ----------
8: [2 B ::= [2  'y' 3]  •'z' 'w' 'x' 'y' 'z' 'w'
9: [2 B ::= [2  'y' 3]  •'z' A
---------- 4 ----------
10: [0 A]   ::= [0 'w' 'x' {2} B•A*
11: [2 B    ::= [2 'y' {3} 'z' 4]   •A
12: [2 B    ::= [2 'y' {3} 'z' 4]   •'w' 'x' 'y' 'z' 'w'
13: [4 A    ::= [4 •'w' 'x' B
14: [4 A    ::= [4 •'w'
---------- 5 ----------
15: [2 B    ::= [2 'y' 'z' {4} 'w' 5]   •'x' 'y' 'z' 'w'
16: [4 A    ::= [4  'w' 5]  •'x' B
17: [0 A 5] ::= [0 'w' 'x' {4} B 5] •
18: [4 A 5] ::= [4  'w' 5]  •
---------- 6 ----------
19: [0 A]   ::= [0 'w' 'x' {2} B•B*
20: [4 A    ::= [4 'w' {5} 'x' 6]   •B
21: [2 B    ::= [2 'y' 'z' 'w' {5} 'x' 6]   •'y' 'z' 'w'
22: [6 B    ::= [6 •'y' 'z' 'w' 'x' 'y' 'z' 'w'
23: [6 B    ::= [6 •'y' 'z' A
---------- 7 ----------
24: [2 B    ::= [2 'y' 'z' 'w' 'x' {6} 'y' 7]   •'z' 'w'
25: [6 B    ::= [6  'y' 7]  •'z' 'w' 'x' 'y' 'z' 'w'
26: [6 B    ::= [6  'y' 7]  •'z' A
---------- 8 ----------
27: [0 A]   ::= [0 'w' 'x' {2} B•A*
28: [6 B    ::= [6 'y' {7} 'z' 8]   •A
29: [2 B    ::= [2 'y' 'z' 'w' 'x' 'y' {7} 'z' 8]   •'w'
30: [6 B    ::= [6 'y' {7} 'z' 8]   •'w' 'x' 'y' 'z' 'w'
31: [8 A    ::= [8 •'w' 'x' B
32: [8 A    ::= [8 •'w'
---------- 9 ----------
33: [6 B    ::= [6 'y' 'z' {8} 'w' 9]   •'x' 'y' 'z' 'w'
34: [8 A    ::= [8  'w' 9]  •'x' B
35: [2 B 9] ::= [2 'y' 'z' 'w' 'x' 'y' 'z' {8} 'w' 9]   •
36: [0 A 9] ::= [0 'w' 'x' {2 8} B 9]   •
38: [8 A 9] ::= [8  'w' 9]  •
---------- 10 ----------

Set:  0     1     2     3     4     5     6     7     8     9
A ::=   'w'   'x'    B....................................
      0     2     5                                        36
B ::=               'y'   'z'    A........................
                  7     9     11
B ::=               'y'   'z'   'w'   'x'   'y'   'z'   'w'
                  6     8     12    15   21    21    29    35
A ::=                           'w'   'x'    B............
                              13    16   20
B ::=                                       'y'   'z'   A
                                         23    26    28
A ::=                                                  'w'
                                                     32    38

Item 36 (the completion of A over 2-9) is derived two ways:

  1. By combining Earley 5 with Earley completion 35
  2. By combining Leo item 27 with completion 38

Since there is a perfectly legitimate completion of B covering sets 2-9 (item 35), nothing would cause my earlier-stated procedure to consider reconstructing the Leo-eliminated completion of the derivation path for B made of items 7, 9, and 11.

It seems that in general, any completion that's actually found in the chart could have arisen in either way. I think I can adjust to account for this fact and will post an update here when I've thought the problem through.