lagodiuk / earley-parser-js

Tiny JavaScript implementation of context-free languages parser - Earley parser (including generation of the parsing-forest).
Apache License 2.0
118 stars 11 forks source link

EPSILON case for right-hand-side #4

Closed gitmh closed 8 years ago

gitmh commented 8 years ago

The parser can deal sometimes with empty rhs (EPSILON case) but sometimes it does not work properly. In the code and in all examples there is no EPSILON case mentioned so it is possible to use it?

For example with input "abba" (does work - results in 5 different trees): X -> a Y | b Y Y -> ​ɛ​ | X | X Y

without the second X (does not work at all): X -> a Y | b Y Y -> ​ɛ​ | X Y

You can also try it here: http://flaci.com/kfgedit just click on new grammar on the lower right corner.

lagodiuk commented 8 years ago

@gitmh, thank you for this finding.

Due to the specificity of original Earley parsing algorithm - handling of the epsilon cases is bit tricky (epsilon productions lead to the larger amount of intermediate states inside parsing chart).

I have written the draft version of modification of Earley Algorithm, which could handle the epsilon productions (works for both of grammars, which you've provided). At the moment implementation is developed in scope of a separate branch: https://github.com/lagodiuk/earley-parser-js/tree/epsilon_productions_handling (commit: e0a10b7). Also, excuse me for some spelling mistakes inside comments (I was a bit in rush).

Below is a link to online demo: https://jsfiddle.net/pfrvm89t/

However, I think that this modification has to be carefully tested.

gitmh commented 8 years ago

@lagodiuk thank you very much for this fast change, it works like a charm. I just added your changes to our project and the trees get generated properly. I'll report back, if we find any issues with other grammars now. My first tests with about 10 grammars worked all well.

We are also looking forward to your lazy traverse function with callback as already mentioned in your comments :-) We used an additional LALR(1) parser to return at least one valid tree if the traverse function exceeds the JS stack limit. It is only a problem with tree generation, the Earley parse works perfectly.

lagodiuk commented 8 years ago

@gitmh, thank you for your feedback!

Also, thank you for the information about your tests - now I feel more confident regarding the possibility to merge _epsilon_productionshandling branch into master.

In the nearest time - I will try to focus on implementation of the callback iteration mechanism over the parse forest.

As a quick workaround, for avoiding the exponential growth of consumed memory - it is possible to modify the method for generation of all possible combinations (between items of different arrays), by adding some threshold for the maximum amount of produced combinations (https://github.com/lagodiuk/earley-parser-js/blob/epsilon_productions_handling/earley-oop.js#L302-314). However, this workaround will lead to restriction of amount of generated parsing trees.

lagodiuk commented 8 years ago

Merged solution to master: https://github.com/lagodiuk/earley-parser-js/pull/5