GLR Support - with benefits to modularity and exposition

kjosib / booze-tools

Booze Tools will become the complete programming-language development workbench, all written in Python 3.9 (for now).

MIT License

14 stars 1 forks source link

GLR Support - with benefits to modularity and exposition #6

Closed kjosib closed 4 years ago

kjosib commented 5 years ago

Right now the LALR construction routine does everything it needs to, as a method on the ContextFreeGrammar object. However, this is conceptually not quite right. The grammar itself should not be responsible for any particular table-construction method. Rather, it should answer queries about the CFG and leave table construction to some other module.

A GLR module can reasonably perform an LR(0+) construction, producing a non-deterministic handle-finding automaton which can then be refined into a deterministic table by (for instance) the LALR algorithm, or used as a good basis for an IE-LALR algorithm to attain full LR(1) recognition: all of this would be done symbolically and leave the construction of a matrix (Dragon-Book table) for later.

A suitable representation of a non-deterministic ACTION table allows the use of GLR for parsing arbitrary and even ambiguous grammars. It would also open the frontier on a planned extension for semantic predicates.

kjosib commented 5 years ago

Update: There is now code to generate both GLR(0) and GLALR structures, and to use either as a basis for determining whether a string is in the language described by a context-free grammar. Nothing is yet in place for recovering the semantic value associated with such a string, though.

kjosib commented 5 years ago

Update: The deterministic LALR is now a thin layer atop GLALR doing most of the work. So, the desired improvements to modularity are achieved. Full accomplishment of this goal, however, requires some additional work: A "mostly-deterministic" table structure must be designed, and the corresponding mostly-deterministic parse engine must be developed.

Nondeterministic ACTION table cells can be those for which precedence declarations fail to disambiguate. They can be encoded with numbers more negative than the size of the rule-base. Upon this condition, two approaches stand out: backtracking or lock-step parallel simulation. Either does the job, but the latter seems the more spirited.

kjosib commented 5 years ago

Improvements to the compiler's run-control architecture mean this is closer to realization. A proper graph-structured-stack module would be the next step in this process.

kjosib commented 5 years ago

Along with the commit that closed #17 came the beginning of a generalized-LR "non-deterministic" shift-reduce algorithm. It still needs work, integration, and tests. But it's progress.

kjosib commented 5 years ago

A "brute-force and ignorance" version of generalized parsing is in place and passing tests. It's a good base for further development, but does not attempt to combine states when they ought to be combined.

kjosib commented 4 years ago

A version of the "Graph-Structured Stack" approach is written in and commented up. It's only a trial-parser (it recognizes but does not interpret), but it passes tests, and profiling efforts show it's asymptotically faster than the corresponding brute-force trial-parser, despite being slightly slower for the proverbial "small N". Unfortunately at the moment I don't have a customer for that line of development, so it's probably going to languish until I do.