Closed w4454962 closed 2 years ago
Thanks for feed back, but I simply don't have time for it...
Here is a not complete C parser adapted from mouse-peg project and performance comparison/measurements https://github.com/ChrisHixon/chpeg/issues/2
这是一个不完整的 C 解析器,改编自 mouse-peg 项目和性能比较/测量ChrisHixon/chpeg#2
I found a quick solution. luajit + lpeglabel parses 27mb of about 300000 lines of code, performs syntax detection, and saves the parse tree. It only takes 2 seconds. I think it is fast enough than most peg solutions, and even close to the running speed of handwriting parser.
Can you share the parser code and the test files ?
I have also tested pegtl template parser to parse these test files, but the speed is not good. Their parse tree design may have problems. The running time is 14 seconds, which is not up to the requirements, and it is difficult to optimize
I just extracted the grammar you've provided in jass-parser.zip
and quickly/dirty
converted it to be able to test it with peglib
and it seems that there are missing pieces on that grammar.
When testing the grammar shown bellow on the playground (https://yhirose.github.io/cpp-peglib/) with this line from war3/24/common.j
we get an error due to boolean
not been defined anywhere:
constant boolean FALSE = false
1:14 syntax error, unexpected 'boolean'.
Jass <-
( Nl / Chunk ) * Sp
Chunk <-
( Type / Globals / Native / Function / Action )
Comment <-
'//' [^'\n'] *
~Sp <-
( Comment / (' ' / '\t' / '\r' / '\n') ) * #ExChar ?
#ExChar <-
# & EXCEPTION_CHAR
Nl <-
( Sp '\n' ) +
Ignore <-
[^'\n'] *
~Cut <-
! [a-zA-Z0-9_]
Ed <-
Sp ( & Nl / ! . )
COMMA <-
Sp ','
ASSIGN <-
Sp '='
GLOBALS <-
Sp 'globals' Cut
ENDGLOBALS <-
Sp 'endglobals' Cut
CONSTANT <-
Sp 'constant' Cut
NATIVE <-
Sp 'native' Cut
ARRAY <-
Sp 'array' Cut
AND <-
Sp 'and' Cut
OR <-
Sp 'or' Cut
NOT <-
Sp 'not' Cut
TYPE <-
Sp 'type' Cut
EXTENDS <-
Sp 'extends' Cut
FUNCTION <-
Sp 'function' Cut
ENDFUNCTION <-
Sp 'endfunction' Cut
NOTHING <-
Sp 'nothing' Cut
TAKES <-
Sp 'takes' Cut
RETURNS <-
Sp ( 'returns' / 'return' ) Cut
CALL <-
Sp ( 'call' / 'set' ) Cut
SET <-
Sp ( 'set' / 'call' ) Cut
RETURN <-
Sp 'return' Cut
IF <-
Sp 'if' Cut
THEN <-
Sp 'then' Cut
ENDIF <-
Sp 'endif' Cut
ELSEIF <-
Sp 'elseif' Cut
ELSE <-
Sp 'else' Cut
LOOP <-
Sp 'loop' Cut
ENDLOOP <-
Sp 'endloop' Cut
EXITWHEN <-
Sp 'exitwhen' Cut
LOCAL <-
Sp 'local' Cut
TRUE <-
Sp 'true' Cut
FALSE <-
Sp 'false' Cut
DEBUG <-
Sp 'debug' Cut
Esc <-
'\\' EChar
EChar <-
'b'
/ 't'
/ 'r'
/ 'n'
/ 'f'
/ '"'
/ '\\'
# / ERROR_ESC
Value <-
NULL
/ Boolean
/ String
/ Real
/ Integer
NULL <-
Sp 'null' Cut
Boolean <-
TRUE
/ FALSE
String <-
Sp <'"' ( Esc / '\n' / [^"] ) * '"'>
Real <-
Sp ( '-' ? Sp ( '.' [0-9] + / [0-9] + '.' [0-9] * ) )
Integer <-
Integer16
/ Integer8
/ Integer10
/ Integer256
Integer8 <-
Sp < '-' ? Sp '0' [0-7] * >
Integer10 <-
Sp < '-' ? Sp '0' / ( [1-9][0-9] * ) >
Integer16 <-
Sp < '-' ? Sp ( '$' / '0x' / '0X' ) Char16 >
Char16 <-
[a-fA-F0-9] +
Integer256 <-
Sp < '-' ? Sp C256 >
C256 <-
"'" C256_1 "'"
/ "'" C256_4 C256_4 C256_4 C256_4 "'"
/ "'"
C256_1 <-
Esc
/ '\n'
/ [^']
C256_4 <-
'\n'
/ [^']
Name <-
Sp <[a-zA-Z][a-zA-Z0-9_] *>
GT <-
'>'
GE <-
'>='
LT <-
'<'
LE <-
'<='
EQ <-
'=='
UE <-
'!='
ADD <-
'+'
SUB <-
'-'
MUL <-
'*'
DIV <-
'/'
MOD <-
'%'
PL <-
Sp '('
PR <-
Sp ')'
BL <-
Sp '['
BR <-
Sp ']'
Exp <-
ECheckAnd
ECheckAnd <-
( ECheckOr ( Sp ESAnd ECheckOr ) * )
ECheckOr <-
( ECheckComp ( Sp ESOr ECheckComp ) * )
ECheckComp <-
( ECheckNot ( Sp ESComp ECheckNot ) * )
ECheckNot <-
( Sp ESNot + ECheckAdd )
/ ECheckAdd
ECheckAdd <-
( ECheckMul ( Sp ESAddSub ECheckMul ) * )
ECheckMul <-
( EUnit ( Sp ESMulDiv EUnit ) * )
ESAnd <-
AND
ESOr <-
OR
ESComp <-
UE
/ EQ
/ LE
/ LT
/ GE
/ GT
/ Sp '='
ESNot <-
NOT
ESAddSub <-
ADD
/ SUB
ESMulDiv <-
MUL
/ DIV
/ MOD
EUnit <-
EParen
/ ECode
/ ECall
/ EValue
/ ENeg
EParen <-
PL Exp PR
ECode <-
( FUNCTION Name ( PL ECallArgs ? PR ) ? )
ECall <-
( Name PL ECallArgs ? PR )
ECallArgs <-
Exp ( COMMA Exp ) *
EValue <-
Value
/ EVari
/ EVar
EVari <-
( Name BL Exp BR )
EVar <-
Name
ENeg <-
Sp SUB EUnit
Type <-
( TYPE TChild TExtends TParent )
TChild <-
Name
TExtends <-
EXTENDS
TParent <-
Name
Globals <-
GLOBALS Nl ( Nl / Global ) * ( ENDGLOBALS Ed ) ?
Global <-
! GLOBALS ! FUNCTION ! NATIVE ( CONSTANT ? Name ARRAY ? Name ( ASSIGN Exp ) ? )
Local <-
( LocalDef LocalExp ? )
/ TYPE Ignore
LocalDef <-
( ( CONSTANT ) ? LOCAL Name ARRAY ? Name )
LocalExp <-
ASSIGN Exp
Locals <-
( Local ? Nl ) +
Action <-
( ( ACall / ASet / AReturn / AExit / ALogic / ALoop / AError ) )
Actions <-
( Nl / Action ) *
ACall <-
( DEBUG ? CALL Name PL ACallArgs ? PR )
ACallArgs <-
Exp ( COMMA Exp ) *
ASet <-
( SET Name ( BL Exp BR ) ? ASSIGN Exp )
AReturn <-
RETURN ( ARExp Ed / Sp )
ARExp <-
Ed
/ Exp Ed
AExit <-
EXITWHEN Exp
ALogic <-
( LIf LElseif * LElse ? LEnd )
LIf <-
( IF ( Exp THEN ) Nl ( Actions ) )
LElseif <-
( ELSEIF ( Exp THEN ) Nl ( Actions ) )
LElse <-
( ELSE Nl ( Actions ) )
LEnd <-
( ENDIF Ed ) ?
ALoop <-
( LOOP Nl Actions ( ENDLOOP Ed ) ? )
AError <-
LOCAL Ignore
/ TYPE Ignore
Native <-
( CONSTANT ? NATIVE Name NTakes NReturns )
NTakes <-
TAKES ( NOTHING / NArg ( COMMA NArg ) * )
NArg <-
Name Name
NReturns <-
RETURNS Name
Function <-
FDef Nl ( FLocals Actions ) FEnd
FDef <-
CONSTANT ? FUNCTION ( Name FTakes FReturns )
FTakes <-
TAKES ( NOTHING / NArg ( COMMA NArg ) * / Sp )
FArg <-
Name Name
FReturns <-
RETURNS Name
FLocals <-
Locals
/
FEnd <-
( ENDFUNCTION Ed ) ?
我刚刚提取了您提供的语法
jass-parser.zip
并将quickly/dirty
其转换为能够对其进行测试,peglib
并且该语法似乎缺少部分。当在操场上( https://yhirose.github.io/cpp-peglib/)测试下面显示的语法时,
war3/24/common.j
我们得到一个错误,因为boolean
没有在任何地方定义:constant boolean FALSE = false
The jass language has the following built-in basic types: nothing, null, handle, code, integer, real, boolean, string
Use the type name extends parent
syntax to extend new types
for example
` type unit extends handle
type effect extends handle
`
This is the implementation version of lex/yacc. Using this project to test the above example takes only 0.7 seconds. Of course, it is not as convenient to expand as Lua, and it is difficult to use its parse tree
There are certain requirements for performance. Can you provide a complete example of syntax parser, such as C language, and then parse, as well as the time required to save the syntax tree.
Of course, other types of syntax can also be used, as long as it can reflect its performance