GLR returning iterator of possible parses

SupraSummus commented 5 years ago

(First of all I don't even know if this is theoretically possible.)

Idea is to have GLRParser.parse(...) return iterator, and each value in that iterator is computed on demand. Parses are sorted for example by order of alternatives chosen during the parse.

Advantages:

this solution is superset of current one. User can always do list(parses) to get evaluated list of all parses.
Possibility to get relatively quick parse on very ambiguous grammar and then if a parse is not desired take next one.
Possibility to apply a crude disambiguation technique by placing production alternatives in right order, and then selecting first parse.

igordejanovic commented 5 years ago

I'm quite sure that it's not theoretically possible :)

GLR constructs trees in parallel, not sequential order like some top-down approaches might be doing. All possible sub-trees are constructed as the input is consumed. Furthermore, all trees are packed and common sub-tree sharing is used as it is natural to do so even with an ambiguous grammars memory consumption increases moderately (just to account for the difference). LR parsing (GLR is just an extension) uses CFG where choice operator is unordered by design so it doesn't matter in what order you place it. CFGs are declarative in nature. Calculated parsing tables will not depend on the order of productions, so the parser wouldn't see any difference.

Iterators/generators would be possible for top-down approaches with backtracking for example (like PEG based parsers which naturally use ordered choice thus are more imperative in nature). E.g. Arpeggio and textX are using this parsing approach so it should be possible to do. For example, you get a first tree and to get the next you continue parsing as if the last match is not successful. The parser do backtrack and continue with the next. As an extra bonus the trees would be returned in a deterministic order. PEGs are always unambiguous so are not meant to be used for the definition of ambiguous languages, but with this hack I think it could be done. The question is would it be usable?

SupraSummus commented 5 years ago

GLR must construct trees in parallel when the input is a non-seekable stream. Parglare isn't based on this assumption - it is not a stream parser and it has random access to the input. This allows to process live heads in any order. For example you can take one head and feed input to it until it forks or succeeds or dies. In case of fork take leftmost head and continue with it. "Sleeping heads" must be stored of course as save-points for "backtracking" (or whatever this is called in case of this GLR variant) along with context info - the position in input sequence.

As for parse ordering: the only requirement to make parses ordered is to order actions in each state. And this is, I believe, possible.

Use case for "ordered, lazy GLR" is parsing programming languages, which has their ambiguity resolved by specifying that first parse is preferred and parses are ordered by order of chosen alternatives. Dhall is an example.

igordejanovic commented 5 years ago

That wouldn't be GLR parser anymore. It would be a backtracking parser. The power of GLR comes from its "online" style of work where it takes a token and do as much work as possible. GLR heads don't only fork, they merge if found in the same state which makes GLR efficient. Worst case is O(n^3) while for nearly deterministic languages runs in O(n). Backtracking parsers could go exponential for certain grammars.

PEG based parsers would do what you need: backtracking and disambiguation by choice ordering. This very difference between PEG parsing and GLR is one of the reason why I started this project. By using alternative ordering for disambiguation you loose the declarative nature of CFG. You don't specify the language anymore, you specify the parser, i.e. tell parser how to parse. You can read more about motivation in my blog post.

Furthermore, GLR is based on DFA and shift-reduce, so the relation between production order and the shift/reduce state actions is something that needs investigation.

I still think that this is something that would be artificial for GLR. It is more natural to be done on recursive descent parsers. But, if you have time to research/hack in this direction I don't mind :) I would just like, if planed to be integrated in this fork, to be made backward compatible, and doen't degrade performance, so it wont disturb existing users.

SupraSummus commented 5 years ago

PEG seems like a way to go for my use case, indeed.

igordejanovic commented 3 years ago

In version 0.14.0 GLR parser returns the forest object which is iterable producing lazy trees. It doesn't work as requested here as it wouldn't be possible but from the user point of view it doesn't differ much. The parser will parse the complete input and then build trees lazily during iteration. See the docs.

igordejanovic / parglare

GLR returning iterator of possible parses #88