patrickhuber / Pliant

MIT License
26 stars 4 forks source link

Strange error in EBNF grammar #70

Closed ArsenShnurkov closed 7 years ago

ArsenShnurkov commented 7 years ago

i am trying to create a test grammar for pliant with EBNF syntax. Here is my grammar: https://github.com/ArsenShnurkov/Pliant-test-for-include/blob/43923b148b0dfccc3490d2e8e03b03dc65bd0884/main/Grammar/syntax2.ebnf

file = ws directives ws ;
ws = [ ows ] ; /* white space */
ows = "_" ; /* obligatory white space */
directives = directive { ows directive } ;
directive = "0" | "1" ;

Here is text to parse: https://github.com/ArsenShnurkov/Pliant-test-for-include/blob/43923b148b0dfccc3490d2e8e03b03dc65bd0884/main/Program.cs#L16

string input = "_0_1_0_0_1_1_";

Why it gives me

Recognized: False, Accepted: False
Error at position 4
(file, 0, 3)
patrickhuber commented 7 years ago

I tried to run your grammar definition through the unit test class I have for the EbnfParser class and it is trying to take ows = "_"; as a qualified identifier and it is failing. Trying to dig in to see why.

ArsenShnurkov commented 7 years ago

Please try this first: Grammar: file = "1" { "2" } "1" ; File: string input = "12221";

patrickhuber commented 7 years ago

Here is the parse forest for:

file = "1" { "2" } "1";
(Ebnf.Definition, 0, 8) -> (Ebnf.Block, 0, 8)
(Ebnf.Block, 0, 8) -> (Ebnf.Rule, 0, 8)
(Ebnf.Rule, 0, 8) ->  (Ebnf.QualifiedIdentifier, 0, 1)  ('=', 6, 1)  (Ebnf.Expression, 2, 7) (';', 23, 1)
(Ebnf.QualifiedIdentifier, 0, 1) -> ('file', 1, 4)
(Ebnf.Expression, 2, 7) -> (Ebnf.Term, 2, 7)
(Ebnf.Term, 2, 7) -> (Ebnf.Factor, 2, 3) (Ebnf.Term, 3, 7)
(Ebnf.Factor, 2, 3) -> (Ebnf.Literal, 2, 3)
(Ebnf.Literal, 2, 3) -> ('"1"', 5, 3)
(Ebnf.Term, 3, 7) -> (Ebnf.Factor, 3, 6) (Ebnf.Term, 6, 7)
(Ebnf.Factor, 3, 6) -> (Ebnf.Repetition, 3, 6)
(Ebnf.Repetition, 3, 6) ->  ('{', 12, 1)  (Ebnf.Expression, 4, 5) ('}', 18, 1)
(Ebnf.Expression, 4, 5) -> (Ebnf.Term, 4, 5)
(Ebnf.Term, 4, 5) -> (Ebnf.Factor, 4, 5)
(Ebnf.Factor, 4, 5) -> (Ebnf.Literal, 4, 5)
(Ebnf.Literal, 4, 5) -> ('"2"', 11, 3)
(Ebnf.Term, 6, 7) -> (Ebnf.Factor, 6, 7)
(Ebnf.Factor, 6, 7) -> (Ebnf.Literal, 6, 7)
(Ebnf.Literal, 6, 7) -> ('"1"', 17, 3)

Running that through the EBNF Parser produced an EBNF object structure without error.

I reduced your original grammar to the smallest subset that I could that still did not parse.

Here is the grammar:

ws = [ ows ] ; /* white space */
ows = "_" ; /* obligatory white space */

Here is the parse forest

 (Ebnf.Definition, 0, 10) -> (Ebnf.Block, 0, 6) (Ebnf.Definition, 6, 10)
 (Ebnf.Block, 0, 6) -> (Ebnf.Rule, 0, 6)
 (Ebnf.Rule, 0, 6) ->  (Ebnf.QualifiedIdentifier, 0, 1)  ('=', 4, 1)  (Ebnf.Expression, 2, 5) (';', 14, 1)
 (Ebnf.QualifiedIdentifier, 0, 1) -> ('ws', 1, 2)
 (Ebnf.Expression, 2, 5) -> (Ebnf.Term, 2, 5)
 (Ebnf.Term, 2, 5) -> (Ebnf.Factor, 2, 5)
 (Ebnf.Factor, 2, 5) -> (Ebnf.Optional, 2, 5)
 (Ebnf.Optional, 2, 5) ->  ('[', 6, 1)  (Ebnf.Expression, 3, 4) (']', 12, 1)
 (Ebnf.Expression, 3, 4) -> (Ebnf.Term, 3, 4)
 (Ebnf.Term, 3, 4) -> (Ebnf.Factor, 3, 4)
 (Ebnf.Factor, 3, 4) -> (Ebnf.QualifiedIdentifier, 3, 4)
 (Ebnf.QualifiedIdentifier, 3, 4) -> ('ows', 3, 3)
 (Ebnf.Definition, 6, 10) -> (Ebnf.Block, 6, 10)
 (Ebnf.Block, 6, 10) -> (Ebnf.Rule, 6, 10)
 (Ebnf.Rule, 6, 10) ->  (Ebnf.QualifiedIdentifier, 6, 7)  ('=', 39, 1)  (Ebnf.Expression, 8, 9) (';', 45, 1)
 (Ebnf.QualifiedIdentifier, 6, 7) -> ('"_"', 3, 3)
 (Ebnf.Expression, 8, 9) -> (Ebnf.Term, 8, 9)
 (Ebnf.Term, 8, 9) -> (Ebnf.Factor, 8, 9)
 (Ebnf.Factor, 8, 9) -> (Ebnf.Literal, 8, 9)
 (Ebnf.Literal, 8, 9) -> ('"_"', 3, 3)

So the following line looks incorrect:

 (Ebnf.QualifiedIdentifier, 6, 7) -> ('"_"', 3, 3)

Two things strike me as odd,

  1. the "_" is being picked up as a qualified identifier.
  2. the token for that identifier has a 3,3 for the origin and location. This usually means it is of zero length, and unless there is a nullable rule in the grammar should not happen.

I need to investigate more, but marking this as a bug.

ArsenShnurkov commented 7 years ago

Running that through the EBNF Parser produced an EBNF object structure without error.

ok. Why whis EBNF object doesn't match test input when converted into grammar and passed into parser?

patrickhuber commented 7 years ago

There may be a secondary bug in the EbnfGrammarGenerator, I'm only looking at the parsing of the EBNF grammars you supplied. Once I find out what is going on I'll move on to debugging the generated grammars.

ArsenShnurkov commented 7 years ago

(Ebnf.QualifiedIdentifier, 6, 7) -> ('"_"', 3, 3)

1) What numbers in brackets mean? 2) Is it possible to add pairs <Line,Position> into dump ? 3) what is the plan of fixing it?

"build the parse forest as you go, augmenting each Earley item with a pointer to node labelled with a triple (s, i, j)

but indexes in the dump don't correspond to positions in input grammar. i.e. I don't understood how SPPF is constructed.

patrickhuber commented 7 years ago

Origin and Location are positions in the parse chart, not the input text. If you count tokens instead of counting text position you can see the relationship.

It is possible to get actual character locations in the parse, but requires call back into the IParseRunner to get the current values. To save on space, the library uses lexer rules for reading tokens and moves the parse forward with the tokens and not character by character. Getting line number would require counting carriage return and line feeds (depending on OS format). The data exists, just not how they are internally stored in the tree structure.

As for fixing, I accept pull requests and debug the error when I can. This is a spare time thing for me so I can't guarantee turn around time.

ArsenShnurkov commented 7 years ago

If you count tokens instead of counting text position you can see the relationship.

I counted tokens. First position is 0 before the first token, last position is 10 after the last token. (';', 14, 1) (']', 12, 1) ('=', 39, 1) (';', 45, 1) are all > 10. What is the "Origin" ?

Under the word "plan" i mean the sequence of actions/operations, not the "time estimate".

patrickhuber commented 7 years ago

Sorry, my mistake. Non terminals in the parse forest are token positions in the parse chart. Tokens are character offset and capture length.

As to steps to fix, the bug is somewhere in the ParseEngine or ParseRunner classes. I'm trying to determine when the ('"_"' , 3, 3) node gets created, but it is not as clear as I expected. Seeing that this is a Token node, the 3, 3 makes less sense now. Once I can successfully parse your grammar, I'll focus on the grammar generator to figure out why it isn't parsing your text. Finally, renaming and refactoring to get spec compliant Ebnf, Abnf and Bnf.

patrickhuber commented 7 years ago

So I think the first part of the bug, where the incorrect token is being selected and the numbers for the token make no sense, is part of the Lexeme recycling logic. Here is the commit with some partial fixes 8bb24be724f32ffd63cdc7d8c8021e00f719cd16 .

Here is the test grammar

ws = [ ows ] ; /* white space */
ows = \"_\" ; /* obligatory white space */

Here is the updated parse forest with the corrected positions.

 (Ebnf.Definition, 0, 10) -> (Ebnf.Block, 0, 6) (Ebnf.Definition, 6, 10)
 (Ebnf.Block, 0, 6) -> (Ebnf.Rule, 0, 6)
 (Ebnf.Rule, 0, 6) ->  (Ebnf.QualifiedIdentifier, 0, 1)  ('=', 4, 1)  (Ebnf.Expression, 2, 5) (';', 14, 1)
 (Ebnf.QualifiedIdentifier, 0, 1) -> ('ws', 1, 2)
 (Ebnf.Expression, 2, 5) -> (Ebnf.Term, 2, 5)
 (Ebnf.Term, 2, 5) -> (Ebnf.Factor, 2, 5)
 (Ebnf.Factor, 2, 5) -> (Ebnf.Optional, 2, 5)
 (Ebnf.Optional, 2, 5) ->  ('[', 6, 1)  (Ebnf.Expression, 3, 4) (']', 12, 1)
 (Ebnf.Expression, 3, 4) -> (Ebnf.Term, 3, 4)
 (Ebnf.Term, 3, 4) -> (Ebnf.Factor, 3, 4)
 (Ebnf.Factor, 3, 4) -> (Ebnf.QualifiedIdentifier, 3, 4)
 (Ebnf.QualifiedIdentifier, 3, 4) -> ('ows', 8, 3)
 (Ebnf.Definition, 6, 10) -> (Ebnf.Block, 6, 10)
 (Ebnf.Block, 6, 10) -> (Ebnf.Rule, 6, 10)
 (Ebnf.Rule, 6, 10) ->  (Ebnf.QualifiedIdentifier, 6, 7)  ('=', 39, 1)  (Ebnf.Expression, 8, 9) (';', 45, 1)
 (Ebnf.QualifiedIdentifier, 6, 7) -> ('"_"', 35, 3)
 (Ebnf.Expression, 8, 9) -> (Ebnf.Term, 8, 9)
 (Ebnf.Term, 8, 9) -> (Ebnf.Factor, 8, 9)
 (Ebnf.Factor, 8, 9) -> (Ebnf.Literal, 8, 9)
 (Ebnf.Literal, 8, 9) -> ('"_"', 41, 3)

You'll notice the QualifiedIdentifier is still listed as '"_"'. This is actually supposed to be ows. It appears that the lexeme for ows is being freed back into the token factory for some reason. Because the '"_"' is the next DfaLexeme to be used, the freed lexeme is grabbed up and used in the parse. That is why it shows up twice in the parse forest and has such a weird position.

It also appears that the position doesn't start at 0 in the first token, so I'll need to make sure I grab it before it is incremented.

patrickhuber commented 7 years ago

Commit 8bb24be724f32ffd63cdc7d8c8021e00f719cd16 fixes the bug in the ParseRunner that was causing lexemes to be improperly managed during cleanup.

Next step, tackling the grammar generator.

patrickhuber commented 7 years ago

Looks like the Parse Engine has a bug in it. Here is the debug output of the parse of your input for the grammar you supplied with the setting turned on to optimize right recursion:

0                                                 file ->●ws directives ws      (0)                     Start
0                                                 ws ->●[ows]       (0)                                  Predict
0                                                 file -> ws●directives ws      (0)                     Predict
0                                                 [ows] ->●ows      (0)                                 Predict
0                                                 [ows] ->●     (0)                                    Predict
0                                                 ws -> [ows]●      (0)                                 Predict
0                                                 [ows] : Pliant.Charts.TransitionState             Transition
0                                                 directives ->●directive {ows directive}       (0)      Predict
0                                                 ows ->●_      (0)                                     Predict
0                                                 directive ->●0        (0)                               Predict
0                                                 directive ->●1        (0)                               Predict
1                                                 ows -> _●     (0)                                    Scan _
0                                                 ows : Pliant.Charts.TransitionState               Transition
1                                                 ws -> [ows]●      (0)                                 Complete
1                                                 file -> ws●directives ws      (0)                     Complete
1                                                 directives ->●directive {ows directive}       (1)      Predict
1                                                 directive ->●0        (1)                               Predict
1                                                 directive ->●1        (1)                               Predict
2                                                 directive -> 0●       (1)                              Scan 0
1                                                 directives : Pliant.Charts.TransitionState        Transition
1                                                 directive : Pliant.Charts.TransitionState         Transition
2                                                 file -> ws directives●ws      (0)                     Complete
2                                                 ws ->●[ows]       (2)                                  Predict
2                                                 file -> ws directives ws●     (0)                    Predict
2                                                 [ows] ->●ows      (2)                                 Predict
2                                                 [ows] ->●     (2)                                    Predict
2                                                 ws -> [ows]●      (2)                                 Predict
2                                                 ws : Pliant.Charts.TransitionState                Transition
2                                                 [ows] : Pliant.Charts.TransitionState             Transition
2                                                 ows ->●_      (2)                                     Predict
3                                                 ows -> _●     (2)                                    Scan _
2                                                 ows : Pliant.Charts.TransitionState               Transition
3                                                 file -> ws directives ws●     (0)                    Complete

This fails at earley set id 3, which is the fourth earley set based on zero index.

Here is the way to turn off right recursion optimization:

var grammar = /* your grammar here*/
var parseEngine = new ParseEngine(grammar, new ParseEngineOptions(optimizeRightRecursion: false));

Here is the debug output for parsing your input with optimizing right recursion turned off:

0                                                 file ->●ws directives ws      (0)                     Start
0                                                 ws ->●[ows]       (0)                                  Predict
0                                                 file -> ws●directives ws      (0)                     Predict
0                                                 [ows] ->●ows      (0)                                 Predict
0                                                 [ows] ->●     (0)                                    Predict
0                                                 ws -> [ows]●      (0)                                 Predict
0                                                 directives ->●directive {ows directive}       (0)      Predict
0                                                 ows ->●_      (0)                                     Predict
0                                                 directive ->●0        (0)                               Predict
0                                                 directive ->●1        (0)                               Predict
1                                                 ows -> _●     (0)                                    Scan _
1                                                 [ows] -> ows●     (0)                                Complete
1                                                 ws -> [ows]●      (0)                                 Complete
1                                                 file -> ws●directives ws      (0)                     Complete
1                                                 directives ->●directive {ows directive}       (1)      Predict
1                                                 directive ->●0        (1)                               Predict
1                                                 directive ->●1        (1)                               Predict
2                                                 directive -> 0●       (1)                              Scan 0
2                                                 directives -> directive●{ows directive}       (1)      Complete
2                                                 {ows directive} ->●ows directive {ows directive}      (2)Predict
2                                                 {ows directive} ->●       (2)                          Predict
2                                                 directives -> directive {ows directive}●      (1)     Predict
2                                                 file -> ws directives●ws      (0)                     Complete
2                                                 ows ->●_      (2)                                     Predict
2                                                 ws ->●[ows]       (2)                                  Predict
2                                                 file -> ws directives ws●     (0)                    Predict
2                                                 [ows] ->●ows      (2)                                 Predict
2                                                 [ows] ->●     (2)                                    Predict
2                                                 ws -> [ows]●      (2)                                 Predict
3                                                 ows -> _●     (2)                                    Scan _
3                                                 {ows directive} -> ows●directive {ows directive}      (2)Complete
3                                                 [ows] -> ows●     (2)                                Complete
3                                                 ws -> [ows]●      (2)                                 Complete
3                                                 file -> ws directives ws●     (0)                    Complete
3                                                 directive ->●0        (3)                               Predict
3                                                 directive ->●1        (3)                               Predict
4                                                 directive -> 1●       (3)                              Scan 1
4                                                 {ows directive} -> ows directive●{ows directive}      (2)Complete
4                                                 {ows directive} ->●ows directive {ows directive}      (4)Predict
4                                                 {ows directive} ->●       (4)                          Predict
4                                                 {ows directive} -> ows directive {ows directive}●     (2)Predict
4                                                 directives -> directive {ows directive}●      (1)     Complete
4                                                 file -> ws directives●ws      (0)                     Complete
4                                                 ows ->●_      (4)                                     Predict
4                                                 ws ->●[ows]       (4)                                  Predict
4                                                 file -> ws directives ws●     (0)                    Predict
4                                                 [ows] ->●ows      (4)                                 Predict
4                                                 [ows] ->●     (4)                                    Predict
4                                                 ws -> [ows]●      (4)                                 Predict
5                                                 ows -> _●     (4)                                    Scan _
5                                                 {ows directive} -> ows●directive {ows directive}      (4)Complete
5                                                 [ows] -> ows●     (4)                                Complete
5                                                 ws -> [ows]●      (4)                                 Complete
5                                                 file -> ws directives ws●     (0)                    Complete
5                                                 directive ->●0        (5)                               Predict
5                                                 directive ->●1        (5)                               Predict
6                                                 directive -> 0●       (5)                              Scan 0
6                                                 {ows directive} -> ows directive●{ows directive}      (4)Complete
6                                                 {ows directive} ->●ows directive {ows directive}      (6)Predict
6                                                 {ows directive} ->●       (6)                          Predict
6                                                 {ows directive} -> ows directive {ows directive}●     (4)Predict
6                                                 {ows directive} -> ows directive {ows directive}●     (2)Complete
6                                                 directives -> directive {ows directive}●      (1)     Complete
6                                                 file -> ws directives●ws      (0)                     Complete
6                                                 ows ->●_      (6)                                     Predict
6                                                 ws ->●[ows]       (6)                                  Predict
6                                                 file -> ws directives ws●     (0)                    Predict
6                                                 [ows] ->●ows      (6)                                 Predict
6                                                 [ows] ->●     (6)                                    Predict
6                                                 ws -> [ows]●      (6)                                 Predict
7                                                 ows -> _●     (6)                                    Scan _
7                                                 {ows directive} -> ows●directive {ows directive}      (6)Complete
7                                                 [ows] -> ows●     (6)                                Complete
7                                                 ws -> [ows]●      (6)                                 Complete
7                                                 file -> ws directives ws●     (0)                    Complete
7                                                 directive ->●0        (7)                               Predict
7                                                 directive ->●1        (7)                               Predict
8                                                 directive -> 0●       (7)                              Scan 0
8                                                 {ows directive} -> ows directive●{ows directive}      (6)Complete
8                                                 {ows directive} ->●ows directive {ows directive}      (8)Predict
8                                                 {ows directive} ->●       (8)                          Predict
8                                                 {ows directive} -> ows directive {ows directive}●     (6)Predict
8                                                 {ows directive} -> ows directive {ows directive}●     (4)Complete
8                                                 {ows directive} -> ows directive {ows directive}●     (2)Complete
8                                                 directives -> directive {ows directive}●      (1)     Complete
8                                                 file -> ws directives●ws      (0)                     Complete
8                                                 ows ->●_      (8)                                     Predict
8                                                 ws ->●[ows]       (8)                                  Predict
8                                                 file -> ws directives ws●     (0)                    Predict
8                                                 [ows] ->●ows      (8)                                 Predict
8                                                 [ows] ->●     (8)                                    Predict
8                                                 ws -> [ows]●      (8)                                 Predict
9                                                 ows -> _●     (8)                                    Scan _
9                                                 {ows directive} -> ows●directive {ows directive}      (8)Complete
9                                                 [ows] -> ows●     (8)                                Complete
9                                                 ws -> [ows]●      (8)                                 Complete
9                                                 file -> ws directives ws●     (0)                    Complete
9                                                 directive ->●0        (9)                               Predict
9                                                 directive ->●1        (9)                               Predict
10                                                directive -> 1●       (9)                              Scan 1
10                                                {ows directive} -> ows directive●{ows directive}      (8)Complete
10                                                {ows directive} ->●ows directive {ows directive}      (10)Predict
10                                                {ows directive} ->●       (10)                         Predict
10                                                {ows directive} -> ows directive {ows directive}●     (8)Predict
10                                                {ows directive} -> ows directive {ows directive}●     (6)Complete
10                                                {ows directive} -> ows directive {ows directive}●     (4)Complete
10                                                {ows directive} -> ows directive {ows directive}●     (2)Complete
10                                                directives -> directive {ows directive}●      (1)     Complete
10                                                file -> ws directives●ws      (0)                     Complete
10                                                ows ->●_      (10)                                    Predict
10                                                ws ->●[ows]       (10)                                 Predict
10                                                file -> ws directives ws●     (0)                    Predict
10                                                [ows] ->●ows      (10)                                Predict
10                                                [ows] ->●     (10)                                   Predict
10                                                ws -> [ows]●      (10)                                Predict
11                                                ows -> _●     (10)                                   Scan _
11                                                {ows directive} -> ows●directive {ows directive}      (10)Complete
11                                                [ows] -> ows●     (10)                               Complete
11                                                ws -> [ows]●      (10)                                Complete
11                                                file -> ws directives ws●     (0)                    Complete
11                                                directive ->●0        (11)                              Predict
11                                                directive ->●1        (11)                              Predict
12                                                directive -> 1●       (11)                             Scan 1
12                                                {ows directive} -> ows directive●{ows directive}      (10)Complete
12                                                {ows directive} ->●ows directive {ows directive}      (12)Predict
12                                                {ows directive} ->●       (12)                         Predict
12                                                {ows directive} -> ows directive {ows directive}●     (10)Predict
12                                                {ows directive} -> ows directive {ows directive}●     (8)Complete
12                                                {ows directive} -> ows directive {ows directive}●     (6)Complete
12                                                {ows directive} -> ows directive {ows directive}●     (4)Complete
12                                                {ows directive} -> ows directive {ows directive}●     (2)Complete
12                                                directives -> directive {ows directive}●      (1)     Complete
12                                                file -> ws directives●ws      (0)                     Complete
12                                                ows ->●_      (12)                                    Predict
12                                                ws ->●[ows]       (12)                                 Predict
12                                                file -> ws directives ws●     (0)                    Predict
12                                                [ows] ->●ows      (12)                                Predict
12                                                [ows] ->●     (12)                                   Predict
12                                                ws -> [ows]●      (12)                                Predict
13                                                ows -> _●     (12)                                   Scan _
13                                                {ows directive} -> ows●directive {ows directive}      (12)Complete
13                                                [ows] -> ows●     (12)                               Complete
13                                                ws -> [ows]●      (12)                                Complete
13                                                file -> ws directives ws●     (0)                    Complete
13                                                directive ->●0        (13)                              Predict
13                                                directive ->●1        (13)                              Predict

You can see it goes all the way to the end and succeeds.

The bug is most likely in this private method in the ParseEngine.cs

private bool IsNextStateQuasiComplete(IState state)

For now, you can use the optimization disabling work around and I'll work on getting a fix in place based on these test results.

patrickhuber commented 7 years ago

The latest deployment version 5.0.0 should fix this error. I've tested it with your sample grammar and it appears to work.