Closed rishavs closed 1 year ago
Hi Rishav,
Yes, error handling in PEGs is not trivial, and I do not have any complete examples at hand. Basically, there are two options to handle errors in a PEG parser:
The simple solution is just to let it fail, NPeg will report this by setting the ok
of the matchresult to false. It will also fill in the matchLen
and matchMax
for you, the latter is a pretty good indication of where your error occured and where your parser was not able to continue. From that you could generate a generic error message telling the user there is a syntax error, and probably provide a line and column number and a little snippet of the subject string to inform the user where the error was
There is another mechanism allowing a bit more control, and likely better error messages, but this requires more work when writing the grammar: for this you can use the E
atom as described in the manual at https://github.com/zevv/npeg#parsing-error-handling. The E
atom will abort parsing with an exception and pass you the string as error message; this is typically used in conjunction with the ordered choice operator |
, for example:
number <- +{'0'..'9'} | E"number"
The above rule says: match one or more occurrences of any character from the set 0..9
, or, when this fails to match, generate an error saying Parsing error at #14: expected "number"
.
The E
atom allows you to generate error messages that are more helpful and can hint your users what is wrong in their text: saying "expected a number" is nicer then "syntax error at line 30 char 22".
I'm afraid the above does not really add anything to the manual, let me know if this is helpful and if there is any way we can improve the documentation on this.
People much smarter then my have also been thinking very hard about this problem:
The
E
atom allows you to generate error messages that are more helpful and can hint your users what is wrong in their text: saying "expected a number" is nicer then "syntax error at line 30 char 22".
I understand how to use the E
atom - my struggle is simply not being able to understand how to structure my grammar so that I can use the E
atom effectively.
For example this is my grammar (just handles var declarations and Type declarations)
let parser = peg "program":
# Tokens
tkTypeOperator <- ':' * *Space
tkEqualsOperator <- '=' * *Space
tkCommaOperator <- ',' * *Space
tkDotOperator <- '.'
tkOptionsOperator <- '|' * *Space
tkDummyOperator <- '_'
# Keywords
kwVar <- "var " * *Space
kwConst <- "const " * *Space
kwPrimaryTypes <- ("Num" | "Bool" | "Void" | "Any") * *Space
keywords <- kwVar | kwConst | kwPrimaryTypes
# Literals
litInteger <- Digit * *(Digit | tkDummyOperator)
litDecimal <- litInteger * tkDotOperator * litInteger
litNumber <- litDecimal | litInteger
litBooleanValues <- "true" | "false"
literal <- litNumber | litBooleanValues
# Expressions
expression <- literal * *Space
# Types
typeOptions <- kwPrimaryTypes * *(tkOptionsOperator * kwPrimaryTypes)
typeDef <- typeOptions
identifier <- !keywords * (+Lower * *(Alnum | tkDummyOperator)) * *Space
typeIdentifier <- +Upper * *(Alnum | tkDummyOperator) * *Space
identDeclaration <- identifier * ?(tkTypeOperator * typeDef)
varDeclarationList <- kwVar * identDeclaration * *(tkCommaOperator * identDeclaration)
constDeclarationList <- kwConst * identDeclaration * *(tkCommaOperator * identDeclaration)
assignmentList <- identifier * *(tkCommaOperator * identifier) * tkEqualsOperator * expression * *(tkCommaOperator * expression)
varDeclareAndAssignList <- varDeclarationList * tkEqualsOperator * expression * *(tkCommaOperator * expression)
constDeclareAndAssignList <- constDeclarationList * tkEqualsOperator * expression * *(tkCommaOperator * expression)
eof <- !1
statement <- varDeclareAndAssignList | constDeclareAndAssignList | varDeclarationList | constDeclarationList | assignmentList
program <- *Space * +statement * *Space * eof
clearly adding the error atom on statement will not be effective. Instead if I should look at adding at a "literal" like level. but the way I have my literals right now, I am not sure it will be much help either.
I assume that I should be looking at creating options chains (x | y | z) in parts where I want the error to be handled. But I just don't know how to restructure my grammar for that, without having to rewrite the entire thing every time i discover that my hierarchy definition is not efficient.
Essentially, what would help me is understanding what are the best practices in structuring the grammar, or seeing some examples of well done grammars with error handling to understand how should I go about it.
Anyway, thank you for weighing in. Maybe the only solution is to just jump in with both feet.
Hmm, I think I see your problem: for example, you want to get a literal
, which is defined as litNumber | litBooleanValues
. If you were to add an error to litNumber
like litDecimal | litInteger | E'number'
, this error would be thrown before you get a chance to try a boolean.
In this case, your error would need to be at the literal
level; you could add a intermediate rule that implements the error without having to rewrite every call site of the literal
rule:
literal2 <- litNumber | litBooleanValues
literal <- literal2 | E"literal"
I'm not good at naming things, thus the literal2
Would something like that work for you?
Closing this because of inactivity, feel free to reopen if needed.
Hi
I am really not able to wrap my head around error reporting. I understand how to use it in small fragments. But I am not sure how to structure the grammar so that the error reporting is effective. Any longer example code here or general tips would be helpful.
thanks Rishav