Precedence Climbing - Githubissues

soasme commented 3 years ago

Is your feature request related to a problem? Please describe. See #137.

Many programming languages have a deep precedence levels, say Java has twentyish. If we use left recursion and right recursion, the performance would be so bad. Instead, many parsers such as GCC,Clang choose to implement precedence climbing.

Describe the solution you'd like

New expression kind: P4_PrecedenceClimb.
Syntax: S = atom < op1 < op2 < op3 < op4;. From a parsing's perspective, it's similar with S = atom ((op1 / op2 / op3 / op4) atom)*, but the construction of AST is based on the precedence level.
New expression flag: P4_RightAssociativity. When not specified, it's left associativity by default.
Syntax: @right_associativity pow = "^";. It allows operators using right associativity. For example, 1^2^3 should be parsed as 1^(2^3), while 1*2*3 should be parsed as (12)3.

Additional context

asmwarrior commented 3 years ago

https://github.com/yhirose/cpp-peglib

Just a reference, this cpp-peglib project has the precedence climbing support, but it is C++.

mingodad commented 2 years ago

Also for performance reasons having a character class that include ranges seems to cut a lot recursive calls, or maybe an optimization pass that would compress choices of characters to one character class.

char = [\x20-\x21] / [\x23-\x5b] / [\x5d-\U0010ffff] / (
    "\\" (
        "\""
        / "/"
        / "\\"
        / "b"
        / "f"
        / "n"
        / "r"
        / "t"
        / "v"
        / ("x" ~ two_hexdigits)
        / ("u" ~ four_hexdigits)
        / ("U" ~ eight_hexdigits)
    )
);

Becomes:

char = [\x20-\x21\x23-\x5b\x5d-\U0010ffff"/\\\b\f\n\r\t\v];

From pegjs:

CharacterClassMatcher
  = "[" "^"? (ClassCharacterRange / ClassCharacter)* "]" "i"? ;

ClassCharacterRange
  = ClassCharacter "-" ClassCharacter ;

mingodad commented 2 years ago

After writing the previous message I decided to make a test to see if PeppaPEG checks for duplication in choices and it seems that it doesn't, both versions of the rule bellow are parsed without any warning/error.

SingleEscapeCharacter
  = "'"
  / "\""
  / "\\"
  / "b"
  / "f"
  / "n"
  / "r"
  / "t"
  / "v" ;

With repetitions:

SingleEscapeCharacter
  = "'"
  / "\""
  / "\\"
  / "b"
  / "f"
  / "n"
  / "n" #duplicated
  / "n" #duplicated
  / "n" #duplicated
  / "r"
  / "t"
  / "v" ;

soasme / PeppaPEG

Precedence Climbing #138