vnmakarov / yaep

Yet Another Earley Parser
Other
135 stars 13 forks source link

Empty root node returned #3

Closed nscaife closed 8 years ago

nscaife commented 8 years ago

When I parse a string with YAEP, I'm getting back an YAEP_NIL root node. I've enabled the debug logging, and it appears that it is parsing correctly, but is not returning the root of this tree. Here's the debug output:

Parsing start...
  Set core = 0
      1 $S : .error $eof, error, 0
      2 $S : .S $eof, DT NN, 0
    -----------
      3 S : .NP0SBJ ADVP0TMP VP PUNCTUATION, DT NN, 0
      4 S : .NP0SBJ VP PUNCTUATION, DT NN, 0
      5 NP0SBJ : .DT NNS, DT, 0
      6 NP0SBJ : .NN, NN, 0

Reading 0=NN(108), Current set=0
New set=1
  Set core = 1
      7 NP0SBJ : NN., RB VBP VBD, 1
      9 S : NP0SBJ .VP PUNCTUATION, VBP VBD, 1
    -----------
     10 VP : .VBP PP0CLR, VBP, 0
     11 VP : .VBD NP0PRD COMMA ADVP0CLR, VBD, 0

Reading 1=VBP(101), Current set=1
New set=2
  Set core = 2
     12 VP : VBP .PP0CLR, IN, 1
    -----------
     13 PP0CLR : .IN NP, IN, 0

Reading 2=IN(105), Current set=2
New set=3
  Set core = 3
     14 PP0CLR : IN .NP, CD NNS NNP JJ, 1
    -----------
     15 NP : .NP0QP PP0TMP, CD JJ, 0
     16 NP : .NP PP, CD NNS NNP JJ, 0
     17 NP : .NNP, NNP, 0
     18 NP : .NNS, NNS, 0
     19 NP : .NP0QP PP, CD JJ, 0
     20 NP0QP : .CD CD, CD, 0
     21 NP0QP : .JJ NNS, JJ, 0

Reading 3=NNS(102), Current set=3
New set=4
  Set core = 4
     22 NP : NNS., PUNCTUATION IN, 1
     23 PP0CLR : IN NP., PUNCTUATION, 2
     25 VP : VBP PP0CLR., PUNCTUATION, 3
     26 S : NP0SBJ VP .PUNCTUATION, PUNCTUATION, 4
    -----------

Reading 4=PUNCTUATION(104), Current set=4
New set=5
  Set core = 5
     27 S : NP0SBJ VP PUNCTUATION., $eof, 5
     28 $S : S .$eof, $eof, 5
    -----------

Reading 5=$eof(-1), Current set=5
New set=6
  Set core = 6
     29 $S : S $eof.,, 6
    -----------
Translation:
      0: EMPTY

Here's the grammar. I'm passing in string leifh.

"\n"
"TERM DT = 97 COMMA = 98 RB = 99 CD = 100 VBP = 101 NNS = 102 NNP = 103 PUNCTUATION = 104 IN = 105 VBD = 106 JJ = 107 NN = 108;\n"
"S : NP0SBJ VP PUNCTUATION\n"
"  | NP0SBJ ADVP0TMP VP PUNCTUATION\n"
"  ;\n"
"NP : NP0QP PP\n"
"  | NNS\n"
"  | NNP\n"
"  | NP PP\n"
"  | NP0QP PP0TMP\n"
"  ;\n"
"ADVP0CLR : RB PP\n"
"  ;\n"
"VP : VBD NP0PRD COMMA ADVP0CLR\n"
"  | VBP PP0CLR\n"
"  ;\n"
"PP0CLR : IN NP\n"
"  ;\n"
"PP : IN NP\n"
"  ;\n"
"ADVP0TMP : RB\n"
"  ;\n"
"PP0TMP : IN NP\n"
"  ;\n"
"NP0SBJ : NN\n"
"  | DT NNS\n"
"  ;\n"
"NP0QP : JJ NNS\n"
"  | CD CD\n"
"  ;\n"
"NP0PRD : CD CD NNS\n"
"  ;\n"

Everything looks OK to me during the parse. Am I doing something wrong? Thanks!

nscaife commented 8 years ago

I figured out that I need to add the translations to get the nodes to output correctly.

vnmakarov commented 8 years ago

Sorry, I should have answered it earlier. I thought it was a real bug for which I have no time to fix it right now.

YEAP translation is described by so called simple directed translation. It is enough to represent a parse tree but have some constraints. Only one abstract node can represent a rule in a parsing tree. For example, you can write

expr = '-' expr # neg (2)

but you can not write

expr = '-' expr # minus (zero 2)

Using sub-tree in the translation is on my very old TODO list for YAEP. But I don't think it will be actually implemented.