ptal / oak

A typed parser generator embedded in Rust code for Parsing Expression Grammars
Apache License 2.0
142 stars 14 forks source link

Well-formed grammar analysis #66

Closed ptal closed 8 years ago

ptal commented 9 years ago

It summarizes what we called the "non-consuming loop analysis" and the left-recursion analysis. Consult the article from Ford, section "Well-formed Grammars". It closes #22 #23 #24.

pczarn commented 9 years ago

My library analyzes well-formedness of context-free grammars: https://github.com/pczarn/cfg

ptal commented 9 years ago

Hey! Thanks for sharing, this is interesting. I went through the doc and code and search for extra-informations but I am not sure which analysis is applicable to PEG. I guess cycle elimination is interesting for allowing left-recursion? This article also presents a method for eliminating left-recursion in the context of PEG.

pczarn commented 9 years ago

I think rule usefulness applies to all grammar-based parsers. For example, unproductive and unreachable are useless:

start -> unproductive | productive_and_reachable
unproductive -> "abc" unproductive
productive_and_reachable -> "abc" productive_and_reachable | "def"
unreachable -> "xyz"

Oak will always give unexpected '<end-of-file>' when an unproductive symbol is expected.To test whether it removes unreachable rules, I'd have to check code generation.

Yes, cycles among unit derivations are a form of indirect left(?) recursion. For example:

A -> B
B -> A

or

A -> B
B -> C
C -> A

etc. During cycle rewrite, all occurences of B, C, ... in the grammar are substituted by A. It seems you already throw an error when an "inlining" cycle is detected.

ptal commented 8 years ago

Thank you for the details. Indeed, unproductive is not directly refuted in Oak, which is not really good. However, it generates an error error: Inlining cycle detected. for the rule unproductive = "abc" unproductive. It should at least give a hint to the grammar writer, so he can think twice about what this rule is really for. You can add the -> () annotation to break the cycle. I wonder what is the exact relation between types and unproductive rules. The well-formed grammar analysis described by Ford encompasses unproductive rules analysis (as well as other errors such as infinite loop ((!e)*). I need to work on that, or if you are interested I would be glad to have a contrib' ;-)

For unreachable, I removed this analysis some while ago because I took the approach to consider grammar as a library of rules so all rules can be called. Therefore, there is no "start" rule. I think it is important for better composability and I do not see any problem with it right now. Do you know if a starting rule allows analysis that could not be performed if we consider a collection of rules?

pczarn commented 8 years ago

A collection of "start" rules is just as good as a single start rule/nonterminal. I changed my library so that only the reachability analysis takes a collection of callable starting nonterminals. There used be a start nonterminal tied to every grammar.

ptal commented 8 years ago

I'm not sure to understand your argument. Why do you think every grammar should have a start nonterminal?

pczarn commented 8 years ago

I didn't mean that. A single start nonterminal is a special case of having multiple public symbols. For example, your public symbols can be {start} or {A, B, C}.

I added fn all_productive so that you can ignore reachability: http://pczarn.github.io/cfg/doc/cfg/usefulness/struct.Usefulness.html

ptal commented 8 years ago

There are no accessibility modifiers, every rules are public so o this analysis is not applicable. However, it is still interesting if I decide at some point to add accessibility modifiers. Anyway, I do not yet use your library for doing grammar analysis in Oak but I'll consider it :-)