document attributes - Githubissues

fabianfreyer commented 5 years ago

I think I understand what {ws=implicit} and {ws=explicit} does, but what do the following attributes mean?

pin
recoverUntil
fragment
simplifyWhenOneChildren

I'd be happy to open a PR to document these somewhere as soon as I understand what they do.

menduz commented 5 years ago

Hello,

So there are Constructions or Rules defined as ConstructionName := Term Term Term {pin=1). In those constructions, terms have a number, starting at 0.

pin=1 means that the parser will never backtrack across the defined term. It is the number of the term in the construction. If the next rule doesn't match a parse error will be triggered immediately or it will try to recover.
recoverUntil after "pinning", if the input is invalid the parser will try to read characters until it can recover from the error. recoverUntil is the terminal construction to start recovering.
fragment means the parser will no treat the construction as a term in the parent construction, instead it is a fragment of a bigger construction and all the children will be injected as part of the construction parent, not of the current construction.

Example:

  FunctionDecl := FunKeyword Name Parameters
  FunKeyword := 'function'
  Name := [A-Z][a-z]*
  Parameters := Parameter+
  Parameter := Name ' ' {fragment=true}

is the same as

  FunctionDecl := FunKeyword Name Parameters
  FunKeyword := 'function'
  Name := [A-Z][a-z]*
  Parameters := (Name ' ')+

simplifyWhenOneChildren simplifies the constructions when it only matches one of the terms, it returns the matched term Example:

Expression := MulExpression
MulExpression := AddExpression (('*' | '/') AddExpression)* {simplifyWhenOneChildren=true}
AddExpression := NominalExpression (('+' | '-') NominalExpression)* {simplifyWhenOneChildren=true}
NominalExpression := Number | VariableName {fragment=true}

So, here are some parsing examples

1
^ Number

1 + 4
^     Number
  ^ Number
^^^^^ AddExpression

1 + Abc * 5 + 1
  ^^^          VariableName
        ^      Number
^                Number
  ^^^^^^^      MulExpression
            ^  Number
^^^^^^^^^^^^^^^  AddExpression

See how a lot of nodes are simplified because they only have one children?

I hope I made myself clear, thanks for your interest in this library :)

fabianfreyer commented 5 years ago

Thank you very much for your explanations! I'm not yet sure how to understand the pin and recoverUntil attributes though. Do you have an example, where this could be used, similar to the one you showed with simplifyWhenOneChildren?

Also, are there any other incompatibilities between attributes? I'm having trouble adding ws=implicit and simplifyWhenOneChildren=true, in a case like the following:

Statement ::= Expression ';' {ws=implicit,simplifyWhenOneChildren=true}
Expression ::= ...

menduz commented 5 years ago

This example parses a JSON file with error recovery and pinning:

{ ws=implicit }
/* JSON WITH ERROR RECOVERY https://www.ietf.org/rfc/rfc4627.txt */
value                ::= false | null | true | object | number | string | array
BEGIN_ARRAY          ::= #x5B /* [ left square bracket */
BEGIN_OBJECT         ::= #x7B /* { left curly bracket */
END_ARRAY            ::= #x5D /* ] right square bracket */
END_OBJECT           ::= #x7D /* } right curly bracket */
NAME_SEPARATOR       ::= #x3A /* : colon */
VALUE_SEPARATOR      ::= #x2C /* , comma */
WS                   ::= [#x20#x09#x0A#x0D]+
false                ::= "false"
null                 ::= "null"
true                 ::= "true"
object               ::= BEGIN_OBJECT object_content? END_OBJECT { pin=1 }
object_content       ::= (member (object_n)*) { recoverUntil=OBJECT_RECOVERY }
object_n             ::= VALUE_SEPARATOR member { recoverUntil=OBJECT_RECOVERY,fragment=true, pin=1 }
Key                  ::= &'"' string { recoverUntil=VALUE_SEPARATOR, pin=1 }
OBJECT_RECOVERY      ::= END_OBJECT | VALUE_SEPARATOR
ARRAY_RECOVERY       ::= END_ARRAY | VALUE_SEPARATOR
MEMBER_RECOVERY      ::= '"' | NAME_SEPARATOR | OBJECT_RECOVERY | VALUE_SEPARATOR
member               ::= Key NAME_SEPARATOR value { recoverUntil=MEMBER_RECOVERY, pin=1 }
array                ::= BEGIN_ARRAY array_content? END_ARRAY { pin=1 }
array_content        ::= array_value (VALUE_SEPARATOR array_value)* { recoverUntil=ARRAY_RECOVERY,fragment=true }
array_value          ::= value { recoverUntil=ARRAY_RECOVERY, fragment=true }

number               ::= "-"? ("0" | [1-9] [0-9]*) ("." [0-9]+)? (("e" | "E") ( "-" | "+" )? ("0" | [1-9] [0-9]*))? { pin=2, ws=explicit }

/* STRINGS */

string                ::= ~'"' (([#x20-#x21] | [#x23-#x5B] | [#x5D-#xFFFF]) | #x5C (#x22 | #x5C | #x2F | #x62 | #x66 | #x6E | #x72 | #x74 | #x75 HEXDIG HEXDIG HEXDIG HEXDIG))* '"' { ws=explicit }
HEXDIG                ::= [a-fA-F0-9] { ws=explicit }

This one is the same but without error recovery:


/* https://www.ietf.org/rfc/rfc4627.txt */
value                ::= false | null | true | object | array | number | string
BEGIN_ARRAY          ::= WS* #x5B WS*  /* [ left square bracket */
BEGIN_OBJECT         ::= WS* #x7B WS*  /* { left curly bracket */
END_ARRAY            ::= WS* #x5D WS*  /* ] right square bracket */
END_OBJECT           ::= WS* #x7D WS*  /* } right curly bracket */
NAME_SEPARATOR       ::= WS* #x3A WS*  /* : colon */
VALUE_SEPARATOR      ::= WS* #x2C WS*  /* , comma */
WS                   ::= [#x20#x09#x0A#x0D]+   /* Space | Tab | \n | \r */
false                ::= "false"
null                 ::= "null"
true                 ::= "true"
object               ::= BEGIN_OBJECT (member (VALUE_SEPARATOR member)*)? END_OBJECT
member               ::= string NAME_SEPARATOR value
array                ::= BEGIN_ARRAY (value (VALUE_SEPARATOR value)*)? END_ARRAY

number                ::= "-"? ("0" | [1-9] [0-9]*) ("." [0-9]+)? (("e" | "E") ( "-" | "+" )? ("0" | [1-9] [0-9]*))?

/* STRINGS */

string                ::= '"' (([#x20-#x21] | [#x23-#x5B] | [#x5D-#xFFFF]) | #x5C (#x22 | #x5C | #x2F | #x62 | #x66 | #x6E | #x72 | #x74 | #x75 HEXDIG HEXDIG HEXDIG HEXDIG))* '"'
HEXDIG                ::= [a-fA-F0-9]

You can test those examples online here: https://menduz.com/ebnf-highlighter/

The pin property is used several times here because the grammar is hard enough to support it, i.e. only objects start with {, so if we need to read a value and we detect a { we can say everything isn't going to be anything else but an object after it. The parser will not backtrack that pin in case of failure.

Also, are there any other incompatibilities between attributes? I'm having trouble adding ws=implicit and simplifyWhenOneChildren=true, in a case like the following

It is possible, simplifyWhenOneChildren would produce inconsistent results with implicit WS children, as a rule of thumb: avoid ws=implicit when possible, it makes things slower and makes more difficult the grammar creation process.

If you want to take a look to a real world grammar built with this package you can refer to Lys Grammar

lys-lang / node-ebnf

document attributes #11