orcmid / miser

The Miser Project is practical demonstration of computation-theoretical aspects of software
https://orcmid.github.io/miser/
Apache License 2.0
3 stars 1 forks source link

Seeking Lexical and Precedence Parsing #25

Closed orcmid closed 3 years ago

orcmid commented 4 years ago

Although it is easy to fashion an top-down (recursive-descent) parser, and that is relatively common these days, the proposal for oFrugal, at least, is with a finite-state lexical parser underneath a precedence language parsing. This takes advantage of the straight-forward semantics that is expressed by oMiser applicative operations. The parser proceeds rapidly through an input text, completing the parsing and evaluations in a linear progression. The idea is to expedite oFrugal input with high-speed parsing not counting the costs of explicit apply operations, which ideally are delayed until the correctness of the input is established.

APPROACH

  1. Rapid lexical parsing scans included data streams as well as console inputs. There are no parser back-up requirements at this level. The lexical rules should be clean and accommodate full Unicode input and output. (The XML-formed load scheme for saved objects is different, since the XML parsing is entirely different and semantically easier.)

  2. Lexical tokens should be simple and eventually contain location information for tying error messages back to locations in the input stream. While most errors might be fatal at this level, there may become a time where there is more delay at the precedence-language operation to provide more-informative failure information in one shot. The lexer also recognizes ligatures (e.g., "::", ":]", and condenses and sometimes removes white-space (e.g., before and after brackets, separators, and operators including ( ) comments and the line continuing "\ ... \". There is a problem about infix "." versus prefix "." and that should be resolved in the lexer if possible. (The feature/method suffixing case is related to the Capsule Hack, #23.)

  3. Rapid precedence-language parsing. This effectively turns a stream of infix- and bracketed material into a reverse-polish-stream form of applicative expression. There are checks that reject certain cases of non-grammatical occurrences, mainly where operands are missing as in "()", mismatches like "[ ... )" and also "," where there is a missing component on one side or the other. We can continue parsing by inserting .NIL in places, but the parsing will ultimately be rejected.

  4. The precedence-language parsing will not proceed at any point where the operand and operation stacks cannot remain properly synchronized. Bracket mismatches tend to be deal-breakers as do unresolvable binding names. In failure cases, it is useful to know where problematic elements appear in the input stream so that error messages can localize relatively well. This is a handy feature of precedence-language parsers along with their considerable speed of operation. Error message output should be such that operating in an IDE can tie the messages to the identified places in the input stream.

  5. Initially, the semantical apply operations resolved in (3) will occur as parsing happens. This can lead to semantic failures (e.g., non-terminating/exploding computations) in the midst of parsing. Production-level parser will defer evaluation of the reverse-polish stream until the parser completes the entire input and there are no grammatical errors. In this context, an unresolvable binding-name is still sudden death. The intermediate form can also be useful in analysis and transformations, going beyond the oFrugal level.

  6. There was a concern for reserved words at the level of oFrugal commands (#24 item 3). This is finessed using "!" prefixes.

The case of (5) is a flavor of mindelayscandoer, a term introduced b Alan Perlis. That is, once the operands of an operation are all in hand, the operation is performed. In a one-pass compiler, instead of performing the operation, code can be emitted. And, either way, it is easy to delay the operations by constructing a reverse-polish stream of the operations that are needed, either emitting code or (as in oFrugal) carrying the operations out once it is determined that there are no errors in the input.

Implementation Note: The production of a reverse-Polish expression of a canonical Ob can be similar, going from recursive descent to emitted reverse-Polish for faster subsequent reloading. This can also be used to retain sharing of elements by providing backward references to elements that occur earlier in the stream. It is not clear how complex this can become and how complexity explosion can be avoided.