yhirose / cpp-peglib

A single file C++ header-only PEG (Parsing Expression Grammars) library
MIT License
879 stars 112 forks source link

[Question] Multiline object with matched deliminators #277

Closed LordAro closed 1 year ago

LordAro commented 1 year ago

I'm trying to construct a grammar that matches the following:

OBJECT : flibble
=====
foobar
=====

Where "foobar" is a multiline string containing anything, deliminated by an arbitrary (but matching) number of equals symbols.

This is the best I've come up with, but this requires putting < > around the entire string, so the AST ends up containing the deliminators as well as the string body. If I remove the <> from MULTILINE_STRING, it doesn't match anymore.

OBJECT <- 'OBJECT' COLON ALPHANUM MULTILINE_STRING
ALPHANUM <- < [A-Za-z][A-Za-z0-9]* >
MULTILINE_STRING <- < $eq<EQ+> MULTILINE_STRING_DATA $eq >
MULTILINE_STRING_DATA <- < (!$eq .)* >
EQ <- '='
COLON <- ':'
%whitespace <- [ \t\r\n]*

Is there something better I can do?

mingodad commented 1 year ago

Playing on the playground I came with this:

OBJECT <- 'OBJECT' COLON ALPHANUM MULTILINE_STRING_HEAD MULTILINE_STRING
ALPHANUM <- < [A-Za-z][A-Za-z0-9]* >
~MULTILINE_STRING_HEAD <- < $eq<EQ+> >
MULTILINE_STRING <- <MULTILINE_STRING_DATA> $eq 
MULTILINE_STRING_DATA <- (!$eq .)* 
EQ <- '='
COLON <- ':'
%whitespace <- [ \t\r\n]*

Output:

+ OBJECT
  - COLON (:)
  - ALPHANUM (flibble)
  - MULTILINE_STRING (foobar
)
LordAro commented 1 year ago

Nice, thanks! That seems to work great, even when combined with

STRING <- MULTILINE_STRING_HEAD MULTILINE_STRING / SINGLE_STR / DOUBLE_STR
SINGLE_STR <- "'" < [^']* > "'"
DOUBLE_STR <- '"' < [^"]* > '"'