yhirose / cpp-peglib

A single file C++ header-only PEG (Parsing Expression Grammars) library
MIT License
880 stars 112 forks source link

Follow-up on recovery with syntax errors #242

Closed rwtolbert closed 2 years ago

rwtolbert commented 2 years ago

I think I have a more complete example of my problem. Now maybe this is impossible to fix, but I'm trying to understand it.

if I take this initial grammar

Primary 
    <- PROPEXP

PROPEXP <- PROPERTY _ FLOAT _ TO _ FLOAT

PROPERTY <- < 'charge'i >

TO <- < 'to'i >

FLOAT
    <- < '-'[0-9]+'.'[0-9]* >
    /  < '-'[0-9]*'.'[0-9]+ >
    /  < [0-9]+'.'[0-9]* >
    /  < [0-9]*'.'[0-9]+ >
    /  INTEGER

NUMBER <- < [0-9]+ >

INTEGER
    <- < '-'NUMBER >
    /  NUMBER

~_ <- [ \t\r\n]*

and run it via peglint as

peglint.exe .\recovery_test.peg --ast --source "charge 1.9 to"
[commandline]:1:14: syntax error, expecting '-', <FLOAT>.

In this case, the syntax error is close to useful, but I think the '-' option is confusing. And note there is no AST output.

So I add a recovery condition as explained in the docs, where the empty condition should fail the parse.

Primary 
    <- PROPEXP

PROPEXP <- PROPERTY _ FLOAT^first _ TO _ FLOAT^second

first <- '' { message "Missing first FLOAT value"}
second <- '' { message "Missing second FLOAT value" }

PROPERTY <- < 'charge'i >

TO <- < 'to'i >

FLOAT
    <- < '-'[0-9]+'.'[0-9]* >
    /  < '-'[0-9]*'.'[0-9]+ >
    /  < [0-9]+'.'[0-9]* >
    /  < [0-9]*'.'[0-9]+ >
    /  INTEGER

NUMBER <- < [0-9]+ >

INTEGER
    <- < '-'NUMBER >
    /  NUMBER

~_ <- [ \t\r\n]*

When I run this via peglint

peglint.exe .\recovery_test.peg --ast --source "charge 1.9 to"
[commandline]:1:14: Missing second FLOAT value
+ Primary
  + PROPEXP
    - PROPERTY (charge)
    - FLOAT/2 (1.9)
    - TO (to)

note that I do get the new custom syntax error BUT I also now see the AST printed out. My expectation would be the with this custom error, I'd get a parse failure and no AST.

In practice, in my app (not peglint) since it tries to build the AST anyway, it visits my lambdas with incomplete/invalid data, rather than just returning false like the default case.

rwtolbert commented 2 years ago

Another way to see this in practice is to modify the first calc example, calc.cc to use this grammar

  // (2) Make a parser
  parser parser(R"(
        # Grammar for Calculator...
        Additive    <- Multitive '+' Additive / Multitive
        Multitive   <- Primary '*' Multitive^cond / Primary
        Primary     <- '(' Additive ')' / Number
        Number      <- < [0-9]+ >
        %whitespace <- [ \t]*
        cond <- '' { message "missing multitative" }
    )");

then try to parse a broken expression like parser.parse(" (1 + 2) * ", val);

and you'll end up with a crash in the Multitative lambda.

yhirose commented 2 years ago

@rwtolbert, thanks for the report! It's a bug, and I have just fixed it.

rwtolbert commented 2 years ago

@yhirose, nice fix. This works really well and is a very welcome upgrade.