patrickhuber / Pliant

MIT License
26 stars 4 forks source link

Precedence is not maintained in the parse forest node order #61

Open patrickhuber opened 7 years ago

patrickhuber commented 7 years ago

A simple grammar like

E -> E + E E -> E * E

The * should have a higher precedence (appear farther to the left) in the parse forest than +

ArsenShnurkov commented 7 years ago

please read also https://cs.stackexchange.com/questions/27937/how-do-i-reconstruct-the-forest-of-syntax-trees-from-the-earley-vector/27952#27952

"the forest of syntax trees can be a very strange one, with kind of siamese twin subtrees that may share the first two edges of some node, but not the third edge. In other words, it may be a very awkward structure. This may explain why Earley calls it "a factored representation of all possible parse trees", without being more specific. Any attempt to surgically separate the siamese twins, without changing the grammar will result in increased complexity. The right way to do it is to binarize the grammar."

it says that conversion to 2NF, or CNF is required to avoid building unpredictable forest structure.

patrickhuber commented 6 years ago

You can also binarize the forest which is what the SPPF implementation based off of Elizabeth Scott's paper does. I think the important part here is going to be traversing the parse forest in the correct order. The "And Node" represents a point of ambiguity. When there is more than one And Node, a parse has ambiguity. I need to do more analysis on what this looks like as far as precedence is concerned. Above I noted 'farther left' but this may not actually be how it is structured in the SPPF.

I have seen other implementations use weights to show precedence, which may be worth consideration.