yhirose / cpp-peglib

A single file C++ header-only PEG (Parsing Expression Grammars) library
MIT License
879 stars 112 forks source link

Can you provide some performance test examples or information? #201

Closed w4454962 closed 2 years ago

w4454962 commented 2 years ago

There are certain requirements for performance. Can you provide a complete example of syntax parser, such as C language, and then parse, as well as the time required to save the syntax tree.

Of course, other types of syntax can also be used, as long as it can reflect its performance

yhirose commented 2 years ago

Thanks for feed back, but I simply don't have time for it...

mingodad commented 2 years ago

Here is a not complete C parser adapted from mouse-peg project and performance comparison/measurements https://github.com/ChrisHixon/chpeg/issues/2

w4454962 commented 2 years ago

这是一个不完整的 C 解析器,改编自 mouse-peg 项目和性能比较/测量ChrisHixon/chpeg#2

I found a quick solution. luajit + lpeglabel parses 27mb of about 300000 lines of code, performs syntax detection, and saves the parse tree. It only takes 2 seconds. I think it is fast enough than most peg solutions, and even close to the running speed of handwriting parser.

mingodad commented 2 years ago

Can you share the parser code and the test files ?

w4454962 commented 2 years ago

你能分享解析器代码和测试文件吗?

jass-parser.zip

Run run.bat

w4454962 commented 2 years ago

I have also tested pegtl template parser to parse these test files, but the speed is not good. Their parse tree design may have problems. The running time is 14 seconds, which is not up to the requirements, and it is difficult to optimize

mingodad commented 2 years ago

I just extracted the grammar you've provided in jass-parser.zip and quickly/dirty converted it to be able to test it with peglib and it seems that there are missing pieces on that grammar.

When testing the grammar shown bellow on the playground (https://yhirose.github.io/cpp-peglib/) with this line from war3/24/common.j we get an error due to boolean not been defined anywhere:

    constant boolean            FALSE                           = false
1:14 syntax error, unexpected 'boolean'.
 Jass  <-
     (  Nl  /  Chunk  )  *  Sp

 Chunk  <-
     (  Type  /  Globals  /  Native  /  Function  /  Action  )

 Comment  <-
     '//' [^'\n'] *

 ~Sp  <-
     (  Comment  /  (' ' / '\t' / '\r' / '\n')  )  *  #ExChar  ?

 #ExChar  <-
#    &  EXCEPTION_CHAR

 Nl  <-
     (  Sp  '\n'  )  +

 Ignore  <-
    [^'\n'] *

 ~Cut  <-
     ! [a-zA-Z0-9_]

 Ed  <-
     Sp  (  &  Nl  /  !  .  )

 COMMA  <-
     Sp  ','

 ASSIGN  <-
     Sp  '='

 GLOBALS  <-
     Sp  'globals'  Cut

 ENDGLOBALS  <-
     Sp  'endglobals'  Cut

 CONSTANT  <-
     Sp  'constant'  Cut

 NATIVE  <-
     Sp  'native'  Cut

 ARRAY  <-
     Sp  'array'  Cut

 AND  <-
     Sp  'and'  Cut

 OR  <-
     Sp  'or'  Cut

 NOT  <-
     Sp  'not'  Cut

 TYPE  <-
     Sp  'type'  Cut

 EXTENDS  <-
     Sp  'extends'  Cut

 FUNCTION  <-
     Sp  'function'  Cut

 ENDFUNCTION  <-
     Sp  'endfunction'  Cut

 NOTHING  <-
     Sp  'nothing'  Cut

 TAKES  <-
     Sp  'takes'  Cut

 RETURNS  <-
     Sp  (  'returns'  /  'return'  )  Cut

 CALL  <-
     Sp  (  'call'  /  'set'  )  Cut

 SET  <-
     Sp  (  'set'  /  'call'  )  Cut

 RETURN  <-
     Sp  'return'  Cut

 IF  <-
     Sp  'if'  Cut

 THEN  <-
     Sp  'then'  Cut

 ENDIF  <-
     Sp  'endif'  Cut

 ELSEIF  <-
     Sp  'elseif'  Cut

 ELSE  <-
     Sp  'else'  Cut

 LOOP  <-
     Sp  'loop'  Cut

 ENDLOOP  <-
     Sp  'endloop'  Cut

 EXITWHEN  <-
     Sp  'exitwhen'  Cut

 LOCAL  <-
     Sp  'local'  Cut

 TRUE  <-
     Sp  'true'  Cut

 FALSE  <-
     Sp  'false'  Cut

 DEBUG  <-
     Sp  'debug'  Cut

 Esc  <-
     '\\'  EChar

 EChar  <-
     'b'
    /  't'
    /  'r'
    /  'n'
    /  'f'
    /  '"'
    /  '\\'
#   /  ERROR_ESC

 Value  <-
     NULL
    /  Boolean
    /  String
    /  Real
    /  Integer

 NULL  <-
     Sp  'null'  Cut

 Boolean  <-
     TRUE
    /  FALSE

 String  <-
     Sp  <'"'  (  Esc  /  '\n'  / [^"] )  *  '"'>

 Real  <-
     Sp  (  '-'  ?  Sp  (  '.' [0-9] +  / [0-9] +  '.' [0-9] *  )  )

 Integer  <-
     Integer16
    /  Integer8
    /  Integer10
    /  Integer256

 Integer8  <-
     Sp  <  '-'  ?  Sp  '0' [0-7] *  >

 Integer10  <-
     Sp <  '-'  ?  Sp  '0'  /  ( [1-9][0-9] *  )  >

 Integer16  <-
     Sp  <  '-'  ?  Sp  (  '$'  /  '0x'  /  '0X'  )  Char16  >

 Char16  <-
    [a-fA-F0-9] +

 Integer256  <-
     Sp <  '-'  ?  Sp  C256  >

 C256  <-
     "'"  C256_1  "'"
    /  "'"  C256_4  C256_4  C256_4  C256_4  "'"
    /  "'"

 C256_1  <-
     Esc
    /  '\n'
    / [^']

 C256_4  <-
     '\n'
    / [^']

 Name  <-
     Sp <[a-zA-Z][a-zA-Z0-9_] *>

 GT  <-
     '>'

 GE  <-
     '>='

 LT  <-
     '<'

 LE  <-
     '<='

 EQ  <-
     '=='

 UE  <-
     '!='

 ADD  <-
     '+'

 SUB  <-
     '-'

 MUL  <-
     '*'

 DIV  <-
     '/'

 MOD  <-
     '%'

 PL  <-
     Sp  '('

 PR  <-
     Sp  ')'

 BL  <-
     Sp  '['

 BR  <-
     Sp  ']'

 Exp  <-
     ECheckAnd

 ECheckAnd  <-
     (  ECheckOr  (  Sp  ESAnd  ECheckOr  )  *  )

 ECheckOr  <-
     (  ECheckComp  (  Sp  ESOr  ECheckComp  )  *  )

 ECheckComp  <-
     (  ECheckNot  (  Sp  ESComp  ECheckNot  )  *  )

 ECheckNot  <-
     (  Sp  ESNot  +  ECheckAdd  )
    /  ECheckAdd

 ECheckAdd  <-
     (  ECheckMul  (  Sp  ESAddSub  ECheckMul  )  *  )

 ECheckMul  <-
     (  EUnit  (  Sp  ESMulDiv  EUnit  )  *  )

 ESAnd  <-
     AND

 ESOr  <-
     OR

 ESComp  <-
     UE
    /  EQ
    /  LE
    /  LT
    /  GE
    /  GT
    /  Sp  '='

 ESNot  <-
     NOT

 ESAddSub  <-
     ADD
    /  SUB

 ESMulDiv  <-
     MUL
    /  DIV
    /  MOD

 EUnit  <-
     EParen
    /  ECode
    /  ECall
    /  EValue
    /  ENeg

 EParen  <-
     PL  Exp  PR

 ECode  <-
     (  FUNCTION  Name  (  PL  ECallArgs  ?  PR  )  ?  )

 ECall  <-
     (  Name  PL  ECallArgs  ?  PR  )

 ECallArgs  <-
     Exp  (  COMMA  Exp  )  *

 EValue  <-
     Value
    /  EVari
    /  EVar

 EVari  <-
     (  Name  BL  Exp  BR  )

 EVar  <-
     Name

 ENeg  <-
     Sp  SUB  EUnit

 Type  <-
     (  TYPE  TChild  TExtends  TParent  )

 TChild  <-
     Name

 TExtends  <-
     EXTENDS

 TParent  <-
     Name

 Globals  <-
     GLOBALS  Nl  (  Nl  /  Global  )  *  (  ENDGLOBALS  Ed  )  ?

 Global  <-
     !  GLOBALS  !  FUNCTION  !  NATIVE  (  CONSTANT  ?  Name  ARRAY  ?  Name  (  ASSIGN  Exp  )  ?  )

 Local  <-
     (  LocalDef  LocalExp  ?  )
    /  TYPE  Ignore

 LocalDef  <-
     (  (  CONSTANT  )  ?  LOCAL  Name  ARRAY  ?  Name  )

 LocalExp  <-
     ASSIGN  Exp

 Locals  <-
     (  Local  ?  Nl  )  +

 Action  <-
     (  (  ACall  /  ASet  /  AReturn  /  AExit  /  ALogic  /  ALoop  /  AError  )  )

 Actions  <-
     (  Nl  /  Action  )  *

 ACall  <-
     (  DEBUG  ?  CALL  Name  PL  ACallArgs  ?  PR  )

 ACallArgs  <-
     Exp  (  COMMA  Exp  )  *

 ASet  <-
     (  SET  Name  (  BL  Exp  BR  )  ?  ASSIGN  Exp  )

 AReturn  <-
     RETURN  (  ARExp  Ed  /  Sp  )

 ARExp  <-
     Ed
    /  Exp  Ed

 AExit  <-
     EXITWHEN  Exp

 ALogic  <-
     (  LIf  LElseif  *  LElse  ?  LEnd  )

 LIf  <-
     (  IF  (  Exp  THEN  )  Nl  (  Actions  )  )

 LElseif  <-
     (  ELSEIF  (  Exp  THEN  )  Nl  (  Actions  )  )

 LElse  <-
     (  ELSE  Nl  (  Actions  )  )

 LEnd  <-
     (  ENDIF  Ed  )  ?

 ALoop  <-
     (  LOOP  Nl  Actions  (  ENDLOOP  Ed  )  ?  )

 AError  <-
     LOCAL  Ignore
    /  TYPE  Ignore

 Native  <-
     (  CONSTANT  ?  NATIVE  Name  NTakes  NReturns  )

 NTakes  <-
     TAKES  (  NOTHING  /  NArg  (  COMMA  NArg  )  *  )

 NArg  <-
     Name  Name

 NReturns  <-
     RETURNS  Name

 Function  <-
     FDef  Nl  (  FLocals  Actions  )  FEnd

 FDef  <-
     CONSTANT  ?  FUNCTION  (  Name  FTakes  FReturns  )

 FTakes  <-
     TAKES  (  NOTHING  /  NArg  (  COMMA  NArg  )  *  /  Sp  )

 FArg  <-
     Name  Name

 FReturns  <-
     RETURNS  Name

 FLocals  <-
     Locals
    /

 FEnd  <-
     (  ENDFUNCTION  Ed  )  ?
w4454962 commented 2 years ago

我刚刚提取了您提供的语法jass-parser.zip并将quickly/dirty其转换为能够对其进行测试,peglib并且该语法似乎缺少部分。

当在操场上( https://yhirose.github.io/cpp-peglib/)测试下面显示的语法时,war3/24/common.j我们得到一个错误,因为boolean没有在任何地方定义:


    constant boolean            FALSE                           = false

The jass language has the following built-in basic types: nothing, null, handle, code, integer, real, boolean, string Use the type name extends parent syntax to extend new types

for example

` type unit extends handle

type effect extends handle

`

https://github.com/lep/pjass

This is the implementation version of lex/yacc. Using this project to test the above example takes only 0.7 seconds. Of course, it is not as convenient to expand as Lua, and it is difficult to use its parse tree