Open fabianfreyer opened 5 years ago
Hello,
So there are Constructions or Rules defined as ConstructionName := Term Term Term {pin=1)
. In those constructions, terms have a number, starting at 0.
pin=1
means that the parser will never backtrack across the defined term. It is the number of the term in the construction. If the next rule doesn't match a parse error will be triggered immediately or it will try to recover.recoverUntil
after "pinning", if the input is invalid the parser will try to read characters until it can recover from the error. recoverUntil
is the terminal construction to start recovering.fragment
means the parser will no treat the construction as a term in the parent construction, instead it is a fragment of a bigger construction and all the children will be injected as part of the construction parent, not of the current construction.Example:
FunctionDecl := FunKeyword Name Parameters
FunKeyword := 'function'
Name := [A-Z][a-z]*
Parameters := Parameter+
Parameter := Name ' ' {fragment=true}
is the same as
FunctionDecl := FunKeyword Name Parameters
FunKeyword := 'function'
Name := [A-Z][a-z]*
Parameters := (Name ' ')+
simplifyWhenOneChildren
simplifies the constructions when it only matches one of the terms, it returns the matched term
Example:
Expression := MulExpression
MulExpression := AddExpression (('*' | '/') AddExpression)* {simplifyWhenOneChildren=true}
AddExpression := NominalExpression (('+' | '-') NominalExpression)* {simplifyWhenOneChildren=true}
NominalExpression := Number | VariableName {fragment=true}
So, here are some parsing examples
1
^ Number
1 + 4
^ Number
^ Number
^^^^^ AddExpression
1 + Abc * 5 + 1
^^^ VariableName
^ Number
^ Number
^^^^^^^ MulExpression
^ Number
^^^^^^^^^^^^^^^ AddExpression
See how a lot of nodes are simplified because they only have one children?
I hope I made myself clear, thanks for your interest in this library :)
Thank you very much for your explanations! I'm not yet sure how to understand the pin
and recoverUntil
attributes though. Do you have an example, where this could be used, similar to the one you showed with simplifyWhenOneChildren
?
Also, are there any other incompatibilities between attributes? I'm having trouble adding ws=implicit
and simplifyWhenOneChildren=true
, in a case like the following:
Statement ::= Expression ';' {ws=implicit,simplifyWhenOneChildren=true}
Expression ::= ...
This example parses a JSON file with error recovery and pinning:
{ ws=implicit }
/* JSON WITH ERROR RECOVERY https://www.ietf.org/rfc/rfc4627.txt */
value ::= false | null | true | object | number | string | array
BEGIN_ARRAY ::= #x5B /* [ left square bracket */
BEGIN_OBJECT ::= #x7B /* { left curly bracket */
END_ARRAY ::= #x5D /* ] right square bracket */
END_OBJECT ::= #x7D /* } right curly bracket */
NAME_SEPARATOR ::= #x3A /* : colon */
VALUE_SEPARATOR ::= #x2C /* , comma */
WS ::= [#x20#x09#x0A#x0D]+
false ::= "false"
null ::= "null"
true ::= "true"
object ::= BEGIN_OBJECT object_content? END_OBJECT { pin=1 }
object_content ::= (member (object_n)*) { recoverUntil=OBJECT_RECOVERY }
object_n ::= VALUE_SEPARATOR member { recoverUntil=OBJECT_RECOVERY,fragment=true, pin=1 }
Key ::= &'"' string { recoverUntil=VALUE_SEPARATOR, pin=1 }
OBJECT_RECOVERY ::= END_OBJECT | VALUE_SEPARATOR
ARRAY_RECOVERY ::= END_ARRAY | VALUE_SEPARATOR
MEMBER_RECOVERY ::= '"' | NAME_SEPARATOR | OBJECT_RECOVERY | VALUE_SEPARATOR
member ::= Key NAME_SEPARATOR value { recoverUntil=MEMBER_RECOVERY, pin=1 }
array ::= BEGIN_ARRAY array_content? END_ARRAY { pin=1 }
array_content ::= array_value (VALUE_SEPARATOR array_value)* { recoverUntil=ARRAY_RECOVERY,fragment=true }
array_value ::= value { recoverUntil=ARRAY_RECOVERY, fragment=true }
number ::= "-"? ("0" | [1-9] [0-9]*) ("." [0-9]+)? (("e" | "E") ( "-" | "+" )? ("0" | [1-9] [0-9]*))? { pin=2, ws=explicit }
/* STRINGS */
string ::= ~'"' (([#x20-#x21] | [#x23-#x5B] | [#x5D-#xFFFF]) | #x5C (#x22 | #x5C | #x2F | #x62 | #x66 | #x6E | #x72 | #x74 | #x75 HEXDIG HEXDIG HEXDIG HEXDIG))* '"' { ws=explicit }
HEXDIG ::= [a-fA-F0-9] { ws=explicit }
This one is the same but without error recovery:
/* https://www.ietf.org/rfc/rfc4627.txt */
value ::= false | null | true | object | array | number | string
BEGIN_ARRAY ::= WS* #x5B WS* /* [ left square bracket */
BEGIN_OBJECT ::= WS* #x7B WS* /* { left curly bracket */
END_ARRAY ::= WS* #x5D WS* /* ] right square bracket */
END_OBJECT ::= WS* #x7D WS* /* } right curly bracket */
NAME_SEPARATOR ::= WS* #x3A WS* /* : colon */
VALUE_SEPARATOR ::= WS* #x2C WS* /* , comma */
WS ::= [#x20#x09#x0A#x0D]+ /* Space | Tab | \n | \r */
false ::= "false"
null ::= "null"
true ::= "true"
object ::= BEGIN_OBJECT (member (VALUE_SEPARATOR member)*)? END_OBJECT
member ::= string NAME_SEPARATOR value
array ::= BEGIN_ARRAY (value (VALUE_SEPARATOR value)*)? END_ARRAY
number ::= "-"? ("0" | [1-9] [0-9]*) ("." [0-9]+)? (("e" | "E") ( "-" | "+" )? ("0" | [1-9] [0-9]*))?
/* STRINGS */
string ::= '"' (([#x20-#x21] | [#x23-#x5B] | [#x5D-#xFFFF]) | #x5C (#x22 | #x5C | #x2F | #x62 | #x66 | #x6E | #x72 | #x74 | #x75 HEXDIG HEXDIG HEXDIG HEXDIG))* '"'
HEXDIG ::= [a-fA-F0-9]
You can test those examples online here: https://menduz.com/ebnf-highlighter/
The pin
property is used several times here because the grammar is hard enough to support it, i.e. only objects start with {
, so if we need to read a value
and we detect a {
we can say everything isn't going to be anything else but an object after it. The parser will not backtrack that pin in case of failure.
Also, are there any other incompatibilities between attributes? I'm having trouble adding ws=implicit and simplifyWhenOneChildren=true, in a case like the following
It is possible, simplifyWhenOneChildren
would produce inconsistent results with implicit WS children, as a rule of thumb: avoid ws=implicit
when possible, it makes things slower and makes more difficult the grammar creation process.
If you want to take a look to a real world grammar built with this package you can refer to Lys Grammar
I think I understand what
{ws=implicit}
and{ws=explicit}
does, but what do the following attributes mean?pin
recoverUntil
fragment
simplifyWhenOneChildren
I'd be happy to open a PR to document these somewhere as soon as I understand what they do.