Closed mingodad closed 2 years ago
if I move Class
and NegatedClass
to the top in Primary
than I can parse https://github.com/yhirose/culebra/blob/master/misc/culebra.peg (after add a newline at the endo of it) but when trying to parse itself I then get:
68:4 syntax error, unexpected '\', expecting <Char>.
At the end I want to generate EBNF from any grammar including peglib
grammar itself something like I did here https://github.com/mingodad/lalr-parser-test for bison/byacc/lemon and here https://github.com/mingodad/peg .
Ideally I would iterate over the ordered list of rules and output then almost as they were written only changing <-
by ::=
, /
by |
and replace some tokens that need different representation.
Something like this one for the peglib
grammar I've got so far.
Copy and paste the EBNF shown bellow at https://www.bottlecaps.de/rr/ui on the tab Edit Grammar then click on the tab View Diagram.
//To be viewd at https://www.bottlecaps.de/rr/ui
Grammar ::=
Spacing Definition+ EndOfFile
Spacing ::=
( Space | Comment )*
Definition ::=
( Ignore IdentCont Parameters LEFTARROW Expression Instruction? )
| ( Ignore Identifier LEFTARROW Expression Instruction? )
EndOfFile ::=
_NOT_ .
Ignore ::=
IGNORE?
IdentCont ::=
IdentStart IdentRest*
Parameters ::=
OPEN Identifier ( COMMA Identifier )* CLOSE
LEFTARROW ::=
( "<-" | "←" ) Spacing
Expression ::=
Sequence ( SLASH Sequence )*
Instruction ::=
BeginBlacket ( InstructionItem ( InstructionItemSeparator InstructionItem )* )? EndBlacket
Identifier ::=
IdentCont Spacing
Sequence ::=
( CUT | Prefix )*
SLASH ::=
'/' Spacing
CUT ::=
"↑" Spacing
Prefix ::=
( AND | NOT )? SuffixWithLabel
AND ::=
'&' Spacing
NOT ::=
'!' Spacing
SuffixWithLabel ::=
Suffix ( LABEL Identifier )?
Suffix ::=
Primary Loop?
LABEL ::=
( '^' | "⇑" ) Spacing
Primary ::=
NegatedClass
| Class
| ( Ignore IdentCont Arguments _NOT_ LEFTARROW )
| ( Ignore Identifier _NOT_ ( Parameters? LEFTARROW ) )
| ( OPEN Expression CLOSE )
| ( BeginTok Expression EndTok )
| ( BeginCapScope Expression EndCapScope )
| ( BeginCap Expression EndCap )
| BackRef
| LiteralI
| Dictionary
| Literal
| DOT
Loop ::=
QUESTION
| STAR
| PLUS
| Repetition
QUESTION ::=
'?' Spacing
STAR ::=
'*' Spacing
PLUS ::=
'+' Spacing
Repetition ::=
BeginBlacket RepetitionRange EndBlacket
NegatedClass ::=
"[^" ( _NOT_ ']' Range )+ ']' Spacing
Class ::=
'[' _NOT_ '^' ( _NOT_ ']' Range )+ ']' Spacing
Arguments ::=
OPEN Expression ( COMMA Expression )* CLOSE
OPEN ::=
'(' Spacing
CLOSE ::=
')' Spacing
BeginTok ::=
'<' Spacing
EndTok ::=
'>' Spacing
BeginCapScope ::=
'$' '(' Spacing
EndCapScope ::=
')' Spacing
BeginCap ::=
'$' IdentCont '<' Spacing
EndCap ::=
'>' Spacing
BackRef ::=
'$' IdentCont Spacing
LiteralI ::=
( ['] ( _NOT_ ['] Char )* "'i" Spacing )
| ( ["] ( _NOT_ ["] Char )* '"i' Spacing )
Dictionary ::=
LiteralD ( PIPE LiteralD )+
Literal ::=
lit_ope
DOT ::=
'.' Spacing
IdentStart ::=
_NOT_ ( "↑" | "⇑" ) ( [a-zA-Z_%] | [\x0080-\xFFFF] )
IdentRest ::=
IdentStart
| [0-9]
LiteralD ::=
lit_ope
PIPE ::=
'|' Spacing
lit_ope ::=
( ['] ( _NOT_ ['] Char )* ['] Spacing )
| ( ["] ( _NOT_ ["] Char )* ["] Spacing )
Char ::=
( '\\' [nrt'"#x1b#x1d\\^] )
| ( '\\' [0-3] [0-7] [0-7] )
| ( '\\' [0-7] [0-7]? )
| ( "\\x" [0-9a-fA-F] [0-9a-fA-F]? )
| ( "\\u" ( ( '0' [0-9a-fA-F] ) | "10" ) [0-9a-fA-F] )
| ( [0-9a-fA-F] )
| ( _NOT_ '\\' . )
Range ::=
( Char '-' Char )
| Char
BeginBlacket ::=
'{' Spacing
RepetitionRange ::=
( Number COMMA Number )
| ( Number COMMA )
| Number
| ( COMMA Number )
EndBlacket ::=
'}' Spacing
Number ::=
[0-9]+ Spacing
COMMA ::=
',' Spacing
Space ::=
' '
| '\t'
| EndOfLine
Comment ::=
'#' ( _NOT_ EndOfLine . )* EndOfLine
EndOfLine ::=
"\r\n"
| '\n'
| '\r'
IGNORE ::=
'~'
InstructionItem ::=
PrecedenceClimbing
| ErrorMessage
| NoAstOpt
InstructionItemSeparator ::=
';' Spacing
PrecedenceClimbing ::=
"precedence" SpacesOom PrecedenceInfo ( SpacesOom PrecedenceInfo )* SpacesZom
ErrorMessage ::=
"message" SpacesOom LiteralD SpacesZom
NoAstOpt ::=
"no_ast_opt" SpacesZom
SpacesZom ::=
Space*
SpacesOom ::=
Space+
PrecedenceInfo ::=
PrecedenceAssoc ( SpacesOom PrecedenceOpe )+
PrecedenceAssoc ::=
[LR]
PrecedenceOpe ::=
( ['] ( _NOT_ ( Space | ['] ) Char )* ['] )
| ( ["] ( _NOT_ ( Space | ["] ) Char )* ["] )
| ( _NOT_ ( PrecedenceAssoc | Space | '}' ) . )+
//Added tokens for railroad generation
_NOT_ ::= '!'
_AND_ ::= '&'
@mingodad, at least I easily found three problems in your translated peg grammar. Here are my corrections.
IdentStart <- !"↑" !"⇑" ([a-zA-Z_%] / [\u0080-\uFFFF])
Range <- (Char '-' Char) / Char
Char <-
'\\' [nrt'"[\]\\^]
/ '\\' [0-3] [0-7] [0-7]
/ '\\' [0-7] [0-7]?
/ "\\x" [0-9a-fA-F] [0-9a-fA-F]?
/ "\\u" (('0' [0-9a-fA-F]) / "10") [0-9a-fA-F]{4, 4} / [0-9a-fA-F]{4, 5}
/ !'\\' .
There seems to be more mistakes in the grammar, and I feel it's pretty dangerous to translate the C++ parser contaminators code to the PEG format by hand. You should carefully check the translated one is really valid with a generated AST. Hope it helps.
I'm trying to manually extract the
peglib
grammar (see bellow) and I've got a grammar that the online playground says it's valid but when I try to parse it with itself I'm getting an error recognizing a charset class.There is a working
peglib
grammar somewhere (other than programmed inpeglib.h
) ?The error when trying to parse it with itself:
The manually extracted
peglib
grammar: