yhirose / cpp-peglib

A single file C++ header-only PEG (Parsing Expression Grammars) library
MIT License
884 stars 112 forks source link

[Question] how to make parser continue after syntax error? #34

Closed LagMeester4000 closed 4 years ago

LagMeester4000 commented 6 years ago

I am trying to write a scripting language using this library. In compilers/interpreters of other scripting languages I've used, you can get multiple errors from a single file/class/function. I know that I can check for a lot of errors after the parser (from this lib) has finished (accessing a private var, calling a function that doesn't exist, etc.), but I can't seem to find a way to check for multiple syntax errors, because the parser stops after encountering one. Is there be a way to ignore a rule if the parser finds a syntax error in my input?

yhirose commented 6 years ago

@LagMeester4000, thanks for the feedback. As you noticed, the parser unfortunately detects and reports only the very first error at this point. Sorry about that...

I actually researched about 'error handling/recovery in PEG' with some technical papers available on Internet, but I realized it wasn't easy task and I didn't have time to work it. I'll hopefully come back to this subject and try to find a reasonable way to implement it, so that this library could be more useful for users. Thanks for bringing up this matter!

LagMeester4000 commented 6 years ago

Yes, I thought about it too, and it doesn't seem easy to implement on a generic parser like this, because finding the end of an expression could be wildly different for each grammar. Then again, I don't know that much about this stuff.

Thanks for the quick reply!

mqnc commented 6 years ago

The way I'm dealing with it is by making a syntaxerror rule that accepts anything and prints a warning when discovered.

expression <- assignment / branch / loop / function / syntaxerror
syntaxerror <- (!\n .)* \n

The ordered priorization of PEG makes this very nice. Only if it's none of the known things, it must be a syntax error. The error consumes the line (or the character or whatever you should skip in an error case) and we can move on.

I am aware that this is by far not bullet proof. Consider this:

error <- .
number <- binary / decimal
binary <- [0-1] / error
decimal <- [0-9] / error

Now this is a stupid example since you could just fix it by swapping binary and decimal in the number rule. My point is: The danger of this approach is that you might throw an error from within a branch that you're not supposed to be in and not only will you get an incorrect warning, the parser will also continue in this false branch because everything matched since the error rule consumes anything.

So when you use this, you have to be sure "At this point, there is no way that I could be in another branch where this could be correct." And there are probably more pitfalls to consider. But so far it has served me well. With this I can write a giant file that I want the parser to be able to parse and then step by step create the grammar and the parser parses everything it already understands.

yhirose commented 4 years ago

Close it for now.