kjosib / booze-tools

Booze Tools will become the complete programming-language development workbench, all written in Python 3.9 (for now).
MIT License
14 stars 1 forks source link

Towards a "Token File" abstraction #27

Open kjosib opened 4 years ago

kjosib commented 4 years ago

In days of multi-pass yore, a lexical analysis phase would operate completely independently of the parse and produce a disk-file containing token information (including all diagnostic data for decent error reporting); this file would then feed into a parser to generate an AST file.

Today the intermediate disk-file is neither necessary nor advisable: ram is spacious. However, the concept points the way to a good style of writing scanner-actions: they should enqueue the tokens they find, and the supporting infrastructure should handle everything about location tracking. This facilitates indent-grammars where the absence of whitespace results in potentially several zero-width outdent tokens. But instead of a true queue, the framework might reasonably retain all the token data in an array for later reference by sequence number. That and dollar will buy you a cup of error reporting.

kjosib commented 4 years ago

It occurred to me that this approach also facilitates a nice GLR implementation. When you go into non-deterministic mode, you place a mark and continue to consume tokens normally while performing a "trial parse" collecting the stream of "correct" decisions at each parse-table inadequacy. Once the parse "settles", it can easily rewind the token stream and replay the chosen parse "for real" with reduction actions. This and a few bits of care get you a pretty-quick implementation for so-called "moderate" amounts of non-determinism.