Closed mingodad closed 2 years ago
Same example on chpeg
playground https://meimporta.eu/chpeg/:
Style 1 explicit white space on token usage:
expr <- _ name _ (opcmp _ name _)* !.
_ {I} <- [ \t\n\r]*
opcmp <- ('==' / '~=' / '<=' / '>=' / '<' / '>' )
name <- [a-z]+
AST:
OFFSET LEN ID NC FLG IDENT "DATA"
0 6 0 3 R-- expr "e == c"
0 1 3 0 --- name "e"
2 2 2 0 --- opcmp "=="
5 1 3 0 --- name "c"
Style 2 white space embedded on tokens:
expr <- _ name (opcmp name)* !.
_ {I} <- [ \t\n\r]*
opcmp <- ('==' / '~=' / '<=' / '>=' / '<' / '>' ) _
name <- [a-z]+ _
AST:
OFFSET LEN ID NC FLG IDENT "DATA"
0 6 0 3 R-- expr "e == c"
0 2 3 0 --- name "e "
2 3 2 0 --- opcmp "== "
5 1 3 0 --- name "c"
In the 2nd example, the token operator <...>
should be used if you want the same result as the 1st one.
expr <- _ name (opcmp name)* !.
opcmp <- < '==' / '~=' / '<=' / '>=' / '<' / '>' > _
name <- < [a-z]+ > _
~_ <- [ \t\n\r]*
After been bitten by the AST optimizer several times and trying to understand it I think found one possible problem/improvement on it, giving a simple grammar in 2 styles. Style 1 explicit white space on token usage:
AST:
Style 2 white space embedded on tokens:
AST:
It seems that the optimizer decide to show or not the
node
value based on number of child:I think that it should use the number of non-ignorable child that when using white space embedded on tokens, see bellow pseudo code.
Pseudo code: