kjosib / booze-tools

Booze Tools will become the complete programming-language development workbench, all written in Python 3.9 (for now).
MIT License
14 stars 1 forks source link

Factor cfg.validate() into individual criteria (and add well-foundedness) #14

Closed kjosib closed 5 years ago

kjosib commented 5 years ago

Clear exposition of theory demands separate small self-contained algorithms to find problems.

A terminal is well-founded. A rule producing only well-founded symbols is well-founded. A symbol which produces a well-founded rule is well-founded. Induction applies. Ill-founded symbols are a grammar bug. The present notion of a non-productive symbol is demonstrably broken.

If all symbols are well-founded, then reachability is a simple transitive-closure from the START symbol(s).

A symbol that produces itself (perhaps indirectly) introduces an infinite-ambiguity into the grammar. This is almost universally considered pathological. It should be noticed and dunned. Self-productions are trivial to catch. For mutual recursion, notice nontrivial strongly-connected components in the graph of unit-like productions.

Epsilon left-recursion is another possible disease of grammars: a point is created at which any number of "epsilon" productions may be tentatively recognized. Consider a graph from symbols to those they may produce before producing a terminal: the union of the null-able prefixes of all their rules. If there are any self-links or nontrivial SCCs, this pathology is diagnosed.