Open patrickhuber opened 8 years ago
please read also https://cs.stackexchange.com/questions/27937/how-do-i-reconstruct-the-forest-of-syntax-trees-from-the-earley-vector/27952#27952
"the forest of syntax trees can be a very strange one, with kind of siamese twin subtrees that may share the first two edges of some node, but not the third edge. In other words, it may be a very awkward structure. This may explain why Earley calls it "a factored representation of all possible parse trees", without being more specific. Any attempt to surgically separate the siamese twins, without changing the grammar will result in increased complexity. The right way to do it is to binarize the grammar."
it says that conversion to 2NF, or CNF is required to avoid building unpredictable forest structure.
You can also binarize the forest which is what the SPPF implementation based off of Elizabeth Scott's paper does. I think the important part here is going to be traversing the parse forest in the correct order. The "And Node" represents a point of ambiguity. When there is more than one And Node, a parse has ambiguity. I need to do more analysis on what this looks like as far as precedence is concerned. Above I noted 'farther left' but this may not actually be how it is structured in the SPPF.
I have seen other implementations use weights to show precedence, which may be worth consideration.
A simple grammar like
E -> E + E E -> E * E
The * should have a higher precedence (appear farther to the left) in the parse forest than +