Closed mingodad closed 2 years ago
Thanks for the report. Unfortunately, there is no way to do what you want unless you make your own AST optimizer. It's not too difficult as you see in the default optimizer code.
The default optimizer code can specify no_ast_opt
with All
mode. By doing this, we can get a similar result. Is it enought?
It's better than before in my point of view.
With the Lua
grammar/input shown bellow I'm getting an unexpected AST
(nothing about Return
on it):
Lua grammar:
# Adapted from https://github.com/edubart/lpegrex/blob/main/parsers/lua.lua
chunk <- SHEBANG? SKIP Block (!.)
Block <- ( Label / Return / Break / Goto / Do / While / Repeat / If / 'for' WB (ForNum / ForIn)
/ FuncDef / 'local' WB (FuncDecl / VarDecl) / Assign / call / ';' SKIP)*
Label <- '::' NAME '::' SKIP
Return <- 'return' WB exprlist?
Break <- 'break' WB
Goto <- 'goto' WB NAME
Do <- 'do' WB Block 'end' WB
While <- 'while' WB expr 'do' WB Block 'end' WB
Repeat <- 'repeat' WB Block 'until' WB expr
If <- 'if' WB expr 'then' WB Block ('elseif' WB expr 'then' WB Block)* ('else' WB Block)? 'end' WB
ForNum <- Id '=' SKIP expr ',' SKIP expr (',' SKIP expr)? 'do' WB Block 'end' WB
ForIn <- idlist 'in' WB exprlist 'do' WB Block 'end' WB
FuncDef <- 'function' WB funcname funcbody
FuncDecl <- 'function' WB Id funcbody
VarDecl <- iddecllist ('=' SKIP exprlist)?
Assign <- varlist '=' SKIP exprlist
Number <- NUMBER SKIP
String <- STRING SKIP
Boolean <- 'false' WB / 'true' WB
Nil <- 'nil' WB
Varargs <- '...' SKIP
Id <- NAME
IdDecl <- NAME ('<' SKIP NAME '>' SKIP)?
Function <- 'function' WB funcbody
Table <- '{' SKIP (field (fieldsep field)* fieldsep?)? '}' SKIP
Paren <- '(' SKIP expr ')' SKIP
Pair <- '[' SKIP expr ']' SKIP '=' SKIP expr / NAME '=' SKIP expr
Call <- callargs
CallMethod <- ':' SKIP NAME callargs
DotIndex <- '.' SKIP NAME
ColonIndex <- ':' SKIP NAME
KeyIndex <- '[' SKIP expr ']' SKIP
indexsuffix <- DotIndex / KeyIndex
callsuffix <- Call / CallMethod
var <- (exprprimary (callsuffix+ indexsuffix / indexsuffix)+) / Id
call <- (exprprimary (indexsuffix+ callsuffix / callsuffix)+)
exprsuffixed <- (exprprimary (indexsuffix / callsuffix)*)
funcname <- (Id DotIndex* ColonIndex?)
funcbody <- '(' SKIP funcargs ')' SKIP Block 'end' WB
field <- Pair / expr
fieldsep <- ',' SKIP / ';' SKIP
callargs <- '(' SKIP (expr (',' SKIP expr)*)? ')' SKIP / Table / String
idlist <- Id (',' SKIP Id)*
iddecllist <- IdDecl (',' SKIP IdDecl)*
funcargs <- (Id (',' SKIP Id)* (',' SKIP Varargs)? / Varargs)?
exprlist <- expr (',' SKIP expr)*
varlist <- var (',' SKIP var)*
opor <- 'or' WB exprand
opand <- 'and' WB exprcmp
opcmp <- ('==' SKIP / '~=' SKIP / '<=' SKIP / '>=' SKIP / '<' SKIP / '>' SKIP) exprbor
opbor <- '|' SKIP exprbxor
opbxor <- '~' SKIP exprband
opband <- '&' SKIP exprbshift
opbshift <- ('<<' SKIP / '>>' SKIP) exprconcat
opconcat <- '..' SKIP exprconcat
oparit <- ('+' SKIP / '-' SKIP) exprfact
opfact <- ('*' SKIP / '//' SKIP / '/' SKIP / '%' SKIP) exprunary
oppow <- '^' SKIP exprunary
opunary <- ('not' WB / '#' SKIP / '-' SKIP / '~' SKIP) exprunary
expr <- expror
expror <- (exprand opor*)
exprand <- (exprcmp opand*)
exprcmp <- (exprbor opcmp*)
exprbor <- (exprbxor opbor*)
exprbxor <- (exprband opbxor*)
exprband <- (exprbshift opband*)
exprbshift <- (exprconcat opbshift*)
exprconcat <- (exprarit opconcat*)
exprarit <- (exprfact oparit*)
exprfact <- (exprunary opfact*)
exprunary <- opunary / exprpow
exprpow <- (exprsimple oppow*)
exprsimple <- Nil / Boolean / Number / String / Varargs / Function / Table / exprsuffixed
exprprimary <- Id / Paren
STRING <- STRING_SHRT / STRING_LONG
STRING_LONG <-
'[' $lsep<'='*> '[' LINEBREAK? <( ! (']' $lsep ']') . )*> ']' $lsep ']'
STRING_SHRT <-
$quote<["']> <( ESCAPE_SEQ / ( ! ( $quote / LINEBREAK ) . ) )*> $quote
ESCAPE_SEQ <- '\\' ESCAPE
ESCAPE <- [\\'"] /
('n' / 't' / 'r' / 'a' / 'b' / 'v' / 'f') /
('x' HEX_DIGIT HEX_DIGIT) /
('u' '{' HEX_DIGIT HEX_DIGIT+ '}') /
('z' SPACE*) /
(DEC_DIGIT DEC_DIGIT !DEC_DIGIT / [012] DEC_DIGIT DEC_DIGIT) /
(LINEBREAK)
NUMBER <- <HEX_NUMBER / DEC_NUMBER>
HEX_NUMBER <- '0' [xX] HEX_PREFIX ([pP] EXP_DIGITS)?
DEC_NUMBER <- DEC_PREFIX ([eE] EXP_DIGITS)?
HEX_PREFIX <- HEX_DIGIT+ ('.' HEX_DIGIT*)? / '.' HEX_DIGIT+
DEC_PREFIX <- DEC_DIGIT+ ('.' DEC_DIGIT*)? / '.' DEC_DIGIT+
EXP_DIGITS <- [+-]? DEC_DIGIT+
COMMENT <- '--' (COMMENT_LONG / COMMENT_SHRT)
COMMENT_LONG <- STRING_LONG
COMMENT_SHRT <- (!LINEBREAK .)*
NAME <- !KEYWORD <NAME_PREFIX NAME_SUFFIX?> SKIP
NAME_PREFIX <- [_a-zA-Z]
NAME_SUFFIX <- [_a-zA-Z0-9]+
SHEBANG <- '#!' (!LINEBREAK .)* LINEBREAK?
~SKIP <- (SPACE+ / COMMENT)*
~LINEBREAK <- ('\n' '\r'? / '\r' '\n'?)
SPACE <- ' ' / '\t' / LINEBREAK
HEX_DIGIT <- [0-9a-fA-F]
DEC_DIGIT <- [0-9]
#EXTRA_TOKENS <- '[[' '[=' '--' # unused rule, here just to force defining these tokens
~WB <-
!NAME_PREFIX SKIP
KEYWORD <-
('and'
/ 'break'
/ 'do'
/ 'elseif'
/ 'else'
/ 'end'
/ 'false'
/ 'for'
/ 'function'
/ 'goto'
/ 'if'
/ 'in'
/ 'local'
/ 'nil'
/ 'not'
/ 'or'
/ 'repeat'
/ 'return'
/ 'then'
/ 'true'
/ 'until'
/ 'while') WB
Input:
return lpegrex
--[[
The MIT License (MIT)
Copyright (c) 2021 Eduardo Bart
Copyright (c) 2014-2020 Sérgio Medeiros
Copyright (c) 2007-2019 Lua.org, PUC-Rio.
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
]]
AST (All):
- chunk/0[NAME] (lpegrex)
AST (Only):
+ chunk
+ Block
+ Return
+ exprlist
+ expr
+ expror
+ exprand
+ exprcmp
+ exprbor
+ exprbxor
+ exprband
+ exprbshift
+ exprconcat
+ exprarit
+ exprfact
+ exprunary/1
+ exprpow
+ exprsimple/7
+ exprsuffixed
+ exprprimary/0
+ Id
- NAME (lpegrex)
But with this input then I'm getting something reasonable:
local a
return lpegrex
--[[
The MIT License (MIT)
Copyright (c) 2021 Eduardo Bart
Copyright (c) 2014-2020 Sérgio Medeiros
Copyright (c) 2007-2019 Lua.org, PUC-Rio.
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
]]
AST (All):
+ chunk/0[Block]
- VarDecl/0[NAME] (a)
- Return/0[NAME] (lpegrex)
If a node only contains a child, the child will be hoisted. So if you want to see as below, you have to add { no_ast_opt }
instruction to Block
.
+ chunk/0[Block]
- Return/0[NAME] (lpegrex)
If you want to control the shape of your AST tree completely, the default AST optimizer shouldn't be used, but you have to create your own AST optimizer instead.
Thank you for your help !
Could we have an analyzes pass like in C/C++ -Wextra
to detect thinks like this and tell the user about it ?
IMHO, a tool can't judge hoisting is good or not correctly. That's why I put two AST panes in the playground, and I still think it's a user's responsibility.
I agree with you about user responsibility
and a tool can't judge hoisting is good or not correctly
but if it can point out areas where mistakes (from user/tool) can happen that would help users on non trivial grammars, that's why I mention similar to -Wextra
users often bitten (like me) would appreciate that info.
I'm posting this here because it's a related question: With the bison/yacc grammar shown bellow how could I get an AST like this:
Actual AST output:
input:
bison/yacc grammar: