Closed ptal closed 8 years ago
My library analyzes well-formedness of context-free grammars: https://github.com/pczarn/cfg
Hey! Thanks for sharing, this is interesting. I went through the doc and code and search for extra-informations but I am not sure which analysis is applicable to PEG. I guess cycle elimination is interesting for allowing left-recursion? This article also presents a method for eliminating left-recursion in the context of PEG.
I think rule usefulness applies to all grammar-based parsers. For example, unproductive
and unreachable
are useless:
start -> unproductive | productive_and_reachable
unproductive -> "abc" unproductive
productive_and_reachable -> "abc" productive_and_reachable | "def"
unreachable -> "xyz"
Oak will always give unexpected '<end-of-file>'
when an unproductive symbol is expected.To test whether it removes unreachable rules, I'd have to check code generation.
Yes, cycles among unit derivations are a form of indirect left(?) recursion. For example:
A -> B
B -> A
or
A -> B
B -> C
C -> A
etc. During cycle rewrite, all occurences of B
, C
, ... in the grammar are substituted by A
. It seems you already throw an error when an "inlining" cycle is detected.
Thank you for the details. Indeed, unproductive
is not directly refuted in Oak, which is not really good. However, it generates an error error: Inlining cycle detected.
for the rule unproductive = "abc" unproductive
. It should at least give a hint to the grammar writer, so he can think twice about what this rule is really for. You can add the -> ()
annotation to break the cycle. I wonder what is the exact relation between types and unproductive rules.
The well-formed grammar analysis described by Ford encompasses unproductive rules analysis (as well as other errors such as infinite loop ((!e)*). I need to work on that, or if you are interested I would be glad to have a contrib' ;-)
For unreachable, I removed this analysis some while ago because I took the approach to consider grammar as a library of rules so all rules can be called. Therefore, there is no "start" rule. I think it is important for better composability and I do not see any problem with it right now. Do you know if a starting rule allows analysis that could not be performed if we consider a collection of rules?
A collection of "start" rules is just as good as a single start rule/nonterminal. I changed my library so that only the reachability analysis takes a collection of callable starting nonterminals. There used be a start nonterminal tied to every grammar.
I'm not sure to understand your argument. Why do you think every grammar should have a start nonterminal?
I didn't mean that. A single start nonterminal is a special case of having multiple public symbols. For example, your public symbols can be {start} or {A, B, C}.
I added fn all_productive
so that you can ignore reachability: http://pczarn.github.io/cfg/doc/cfg/usefulness/struct.Usefulness.html
There are no accessibility modifiers, every rules are public so o this analysis is not applicable. However, it is still interesting if I decide at some point to add accessibility modifiers. Anyway, I do not yet use your library for doing grammar analysis in Oak but I'll consider it :-)
It summarizes what we called the "non-consuming loop analysis" and the left-recursion analysis. Consult the article from Ford, section "Well-formed Grammars". It closes #22 #23 #24.