Closed tertsdiepraam closed 1 year ago
Arguments against the current recursive descent approach:
It doesn't scale well. For each operator we add, at each precedence level, new parsing functions have to be added, and they need to be called from the top-level expression parser one way or another. As we discovered with inversion()
, this is not a fool-proof process. With my approach, new operators just need a precedence and associativity specified; no parser code is affected.
It doesn't group together different kinds of operators. Expressions can naturally be subdivided into binary, unary (prefix / postfix), and unit / atomic parts, but the current approach allows for these operators to be parsed arbitrarily and in any order. This also means that there's boilerplate: every unary/binary operator is parsed in the same way, but they all require their own code.
It's difficult to implement operator chain verification. As implemented by my approach, we are able to check for and prevent ambiguous expressions like a + b | c
or a < b == c
. Implementing support for this in recursive descent would require duplicate versions of the parser functions for different contexts (e.g. |
would be parsed differently in the right-hand side of +
and vice versa) and increase boilerplate significantly.
While my approach looks more complex than recursive descent, it is doing a lot more work, it allows us to add new operators easily, and we can experiment with interesting parsing semantics. It is a lot easier to reason about the precedences of operators (just check Prec
), and it will better support us in adding to the compiler over time.
There's 3 possible relevant options for precedence parsing (afaik):
Not very high priority, but we could evaluate these a bit and settle on one of these techniques.