lys-lang / node-ebnf

Create AST PEG Parsers from formal grammars in JavaScript
https://menduz.com/ebnf-highlighter/
MIT License
103 stars 9 forks source link

document attributes #11

Open fabianfreyer opened 5 years ago

fabianfreyer commented 5 years ago

I think I understand what {ws=implicit} and {ws=explicit} does, but what do the following attributes mean?

I'd be happy to open a PR to document these somewhere as soon as I understand what they do.

menduz commented 5 years ago

Hello,

So there are Constructions or Rules defined as ConstructionName := Term Term Term {pin=1). In those constructions, terms have a number, starting at 0.

Example:

  FunctionDecl := FunKeyword Name Parameters
  FunKeyword := 'function'
  Name := [A-Z][a-z]*
  Parameters := Parameter+
  Parameter := Name ' ' {fragment=true}

is the same as

  FunctionDecl := FunKeyword Name Parameters
  FunKeyword := 'function'
  Name := [A-Z][a-z]*
  Parameters := (Name ' ')+

I hope I made myself clear, thanks for your interest in this library :)

fabianfreyer commented 5 years ago

Thank you very much for your explanations! I'm not yet sure how to understand the pin and recoverUntil attributes though. Do you have an example, where this could be used, similar to the one you showed with simplifyWhenOneChildren?

Also, are there any other incompatibilities between attributes? I'm having trouble adding ws=implicit and simplifyWhenOneChildren=true, in a case like the following:

Statement ::= Expression ';' {ws=implicit,simplifyWhenOneChildren=true}
Expression ::= ...
menduz commented 5 years ago

This example parses a JSON file with error recovery and pinning:

{ ws=implicit }
/* JSON WITH ERROR RECOVERY https://www.ietf.org/rfc/rfc4627.txt */
value                ::= false | null | true | object | number | string | array
BEGIN_ARRAY          ::= #x5B /* [ left square bracket */
BEGIN_OBJECT         ::= #x7B /* { left curly bracket */
END_ARRAY            ::= #x5D /* ] right square bracket */
END_OBJECT           ::= #x7D /* } right curly bracket */
NAME_SEPARATOR       ::= #x3A /* : colon */
VALUE_SEPARATOR      ::= #x2C /* , comma */
WS                   ::= [#x20#x09#x0A#x0D]+
false                ::= "false"
null                 ::= "null"
true                 ::= "true"
object               ::= BEGIN_OBJECT object_content? END_OBJECT { pin=1 }
object_content       ::= (member (object_n)*) { recoverUntil=OBJECT_RECOVERY }
object_n             ::= VALUE_SEPARATOR member { recoverUntil=OBJECT_RECOVERY,fragment=true, pin=1 }
Key                  ::= &'"' string { recoverUntil=VALUE_SEPARATOR, pin=1 }
OBJECT_RECOVERY      ::= END_OBJECT | VALUE_SEPARATOR
ARRAY_RECOVERY       ::= END_ARRAY | VALUE_SEPARATOR
MEMBER_RECOVERY      ::= '"' | NAME_SEPARATOR | OBJECT_RECOVERY | VALUE_SEPARATOR
member               ::= Key NAME_SEPARATOR value { recoverUntil=MEMBER_RECOVERY, pin=1 }
array                ::= BEGIN_ARRAY array_content? END_ARRAY { pin=1 }
array_content        ::= array_value (VALUE_SEPARATOR array_value)* { recoverUntil=ARRAY_RECOVERY,fragment=true }
array_value          ::= value { recoverUntil=ARRAY_RECOVERY, fragment=true }

number               ::= "-"? ("0" | [1-9] [0-9]*) ("." [0-9]+)? (("e" | "E") ( "-" | "+" )? ("0" | [1-9] [0-9]*))? { pin=2, ws=explicit }

/* STRINGS */

string                ::= ~'"' (([#x20-#x21] | [#x23-#x5B] | [#x5D-#xFFFF]) | #x5C (#x22 | #x5C | #x2F | #x62 | #x66 | #x6E | #x72 | #x74 | #x75 HEXDIG HEXDIG HEXDIG HEXDIG))* '"' { ws=explicit }
HEXDIG                ::= [a-fA-F0-9] { ws=explicit }

This one is the same but without error recovery:


/* https://www.ietf.org/rfc/rfc4627.txt */
value                ::= false | null | true | object | array | number | string
BEGIN_ARRAY          ::= WS* #x5B WS*  /* [ left square bracket */
BEGIN_OBJECT         ::= WS* #x7B WS*  /* { left curly bracket */
END_ARRAY            ::= WS* #x5D WS*  /* ] right square bracket */
END_OBJECT           ::= WS* #x7D WS*  /* } right curly bracket */
NAME_SEPARATOR       ::= WS* #x3A WS*  /* : colon */
VALUE_SEPARATOR      ::= WS* #x2C WS*  /* , comma */
WS                   ::= [#x20#x09#x0A#x0D]+   /* Space | Tab | \n | \r */
false                ::= "false"
null                 ::= "null"
true                 ::= "true"
object               ::= BEGIN_OBJECT (member (VALUE_SEPARATOR member)*)? END_OBJECT
member               ::= string NAME_SEPARATOR value
array                ::= BEGIN_ARRAY (value (VALUE_SEPARATOR value)*)? END_ARRAY

number                ::= "-"? ("0" | [1-9] [0-9]*) ("." [0-9]+)? (("e" | "E") ( "-" | "+" )? ("0" | [1-9] [0-9]*))?

/* STRINGS */

string                ::= '"' (([#x20-#x21] | [#x23-#x5B] | [#x5D-#xFFFF]) | #x5C (#x22 | #x5C | #x2F | #x62 | #x66 | #x6E | #x72 | #x74 | #x75 HEXDIG HEXDIG HEXDIG HEXDIG))* '"'
HEXDIG                ::= [a-fA-F0-9]

You can test those examples online here: https://menduz.com/ebnf-highlighter/

The pin property is used several times here because the grammar is hard enough to support it, i.e. only objects start with {, so if we need to read a value and we detect a { we can say everything isn't going to be anything else but an object after it. The parser will not backtrack that pin in case of failure.


Also, are there any other incompatibilities between attributes? I'm having trouble adding ws=implicit and simplifyWhenOneChildren=true, in a case like the following

It is possible, simplifyWhenOneChildren would produce inconsistent results with implicit WS children, as a rule of thumb: avoid ws=implicit when possible, it makes things slower and makes more difficult the grammar creation process.

If you want to take a look to a real world grammar built with this package you can refer to Lys Grammar