zevv / npeg

PEGs for Nim, another take
MIT License
330 stars 22 forks source link

AssertionDefect when matching simple example #29

Closed AlectronikForge closed 3 years ago

AlectronikForge commented 3 years ago

Hi,

I found npeg just now and tried it out with the following example.

`import npeg, strutils, tables

type Dict = Table[string, int]

let parser = peg("program", d: Dict): program <- statement statement <- var_decl word <- +Alpha varC <- "var" semi <- ';' var_decl <- varC >word * semi

var words: Table[string, int] discard parser.match("var hello;", words).ok echo words`

But it doesn't match - what am I doing wrong? Thanks for any hint!

zevv commented 3 years ago

Debugging parsers is tricky, but Npeg can help you out a bit with this: if you compile your code with -d:npegTrace, NPeg will dump diagnostic traces of theparser running which shows you what it is doing. In your case this will show this:

  0|  0|  0|var hello;              |program        |choice 3                                |
  1|  0|  0|var hello;              |program        | call statement:4                       |*
  4|  0|  0|var hello;              |statement      |jump var_decl:6                         |*
  6|  0|  0|var hello;              |varC           |chr "v"                                 |*
  7|  0|  1|ar hello;               |varC           |chr "a"                                 |*
  8|  0|  2|r hello;                |varC           |chr "r"                                 |*
  9|  0|  3| hello;                 |Alpha          |set {'A'..'Z','a'..'z'}                 |*
 15|  0|  3| hello;                 |               |opFail (backtrack)                      |*
  3|  0|  0|var hello;              |program        |return                                  |

The 4th column shows you a slice of the subject string currently getting parsed, and to the right of that the state of the parser and what it is trying to do.

Here you can see that your parser is able to parse the var string, but then runs into an Error when trying to match Alpha, but instead has a space as the next character. Looking at your grammar this makes sense, as it says that after the var string, it should expect one or more Alpha characters.

The solution is to make the white space explict in the grammer. This is inherent to PEGs, which do not have a separate tokenization stage, so they will handle white space just like any other character in your subject.

This is your fixed grammar:

let parser = peg("program", d: Dict):
  program   <- *statement
  space     <- +" "
  statement <- var_decl 
  word      <- +Alpha
  varC      <- "var"
  semi      <- ';'
  var_decl  <- varC * space * >word * semi

it adds a target rule space that matches one or more space characters, and inserts this rule between varC and >word.

Let me know if this solves your problem!

AlectronikForge commented 3 years ago

Thanks! Indeed this was the problem, quite a beginner one as I had the same problem (forgetting whitespace) before but forget again about it because e.g. ANTLR and textX have special rules to 'forget' about whitespace characters and comments.

Maybe it could be a feature to add such a feature? Unfortunately I'm too new to nim to be a candidate yet to write such one but maybe later..

Thanks! The debugging feature indeed is very helpful.

zevv commented 3 years ago

Yeah, I did some experiments with ignoring white space in the past, but I was not really happy with some of the subtle effects it brings. Also it there is a run time cost to it because for every possible match the parser also has to check for whitespace matching, which I didn't find worth the gains at that time.

I'll keep this in mind though for future improvements!