vtereshkov / umka-lang

Umka: a statically typed embeddable scripting language
BSD 2-Clause "Simplified" License
1.08k stars 53 forks source link

Parsing expressions using an operator precedence table instead of recursive descent #335

Closed luauser32167 closed 10 months ago

luauser32167 commented 10 months ago

The current expression parsing code is a bit long (using recursive descent for resolving the operators' precedence) and is taking many function calls to parse a single integer literal:

parseExpr
  parseLogicalExpr
    parseLogicalTerm
      parseRelation
        parseRelationTerm
          parseTerm
            parseFactor
              TOK_INTNUMBER

There's a simpler/more compact/efficient way of parsing expressions called "Pratt parsing". Here are some sources that talk about it/explain it.

Making a calculator from scratch - #SoME3 Discussion with Casey Muratori about how easy precedence is... Pratt's original paper -- hard to understand, in my opinion

You can see it being used in Lua here: Lua's operator precedence table

Would rewriting the parser result in a measurable speedup? For huge tables of integers/constants, maybe, otherwise probably not. It would reduce the number of lines of code, for sure though.

vtereshkov commented 10 months ago

I'm not sure it can have any effect, even in terms of lines of code: parsing is interleaved with code generation, which is not uniform (a && b differs much from a & b), the ternary operator probably needs a special treatment, etc.

luauser32167 commented 10 months ago

@vtereshkov here is a modified umka_expr.c that has a #define USE_RECURSIVE_DESCENT_EXPRESSION_PARSER to switch from the current recursive descent expression parser to a pratt expression parser. It seems to produce the same output (diffing tests\expected.log), but that doesn't mean that it is correct :^).