zevv / npeg

PEGs for Nim, another take
MIT License
330 stars 22 forks source link

Performance and Complexity #49

Closed vanyle closed 1 year ago

vanyle commented 1 year ago

Hello! I was wondering what is the complexity of npeg. Does it depend on the grammar ? It would be nice to have a section in the documentation about it. For example consider the example in the README.md to parse arithmetic expression.

let parser = peg "line":
  exp      <- term   * *( ('+'|'-') * term)
  term     <- factor * *( ('*'|'/') * factor)
  factor   <- +{'0'..'9'} | ('(' * exp * ')')
  line     <- exp * !1

doAssert parser.match("3*(4+15)+2").ok

Is the match performed in linear time ?

I'm considering building a simple C compiler and I need a parsing library because I'm too lazy to read parser theory to build a parser by hand. I have https://github.com/vanyle/lrparser, but it's an SLR parser and I'm not sure it's powerful enough to parse C (also I haven't implemented any error handling.) Obviously, linear parsing time is a requirement when dealing with tens of thousands of lines of code to parse.

zevv commented 1 year ago

Yes, NPeg basically runs in linear time; the grammar is compiled into a parsing machine IR, which is again compiled to Nim code, which gets compiled to machine code.

If you feel brave you can compile your grammar with -d:npegexpand to see the generated nim code for the parser: this is basically a state machine running in a computed goto case statement. Backtracking might increase your run time, but in practice most grammars do not rely heavily on backtracking.

Adding a section to the documentation is a good idea, I'll leave this ticket open as a reminder.

btw, if you want to parse expressions with operator precedence, I recommend looking at the built in precedence-climbing parser operators in npeg, a complete arithmatic parser example can be found in https://github.com/zevv/npeg/blob/master/tests/precedence.nim