Closed mingodad closed 2 years ago
@mingodad, could you put the smallest possible PEG grammar here, so that I can reproduce it on my machine easily? Thanks!
I'm seeing various corruption in the error message with this grammar on the playground:
ROOT <- CONTENT !.
CONTENT <- (ELEMENT / TEXT)*
ELEMENT <- $(STAG CONTENT ETAG)
STAG <- '<' < $tag<TAGNAME> > '>'
ETAG <- '</' < $tag > '>'
TAGNAME <- 'a' / 'b'i
TEXT <- (![<] .)+
Input: <a>foo</A>
On Firefox, the error I'm currently seeing with the above grammar/input:
1:9 syntax error, unexpected 'A', expecting 'd tota % success fail definition 13 4 '.
It seems more apt to happen if i
is added to the literals in TAGNAME, but I've seen corruption in simpler cases. Minor edits of TAGNAME change the corruption, even things like altering number of spaces. I see corruption in both Chromium and Firefox, even after refreshing, clearing cookies and local data, etc.
The command line lint seems to always show the error I believe is the proper error (with lots of variations on the TAGNAME): 1:9: syntax error, unexpected 'A', expecting 'a'.
I'll see if I can narrow it down to simpler grammar any...
This is about as simple as I can get it and still see consistent corruption:
ROOT <- CONTENT !.
CONTENT <- (ELEMENT / TEXT)*
ELEMENT <- $(STAG CONTENT ETAG)
STAG <- '<' < $tag<"a"> > '>'
ETAG <- '</' < $tag > '>'
TEXT <- (![<] .)+
Input: <a>foo</A>
Most of the time error is: 1:9 syntax error, unexpected 'A', expecting 'd '.
Occasionally: 1:9 syntax error, unexpected 'A', expecting 's) i'.
@ChrisHixon, thanks for the problem report. I fixed it at 3c2a53c79b7642a547127b31e102526be72206e5.
@mingodad, I would like to make sure I understand what you are mentioning here.
The current cpp-peglib backreference behavior is 'exact match' to the captured string, and same as the regular expression.
If your suggestion says this example should succeed, I am not sure if it's correct. Could you explain more clearly?
After you showing it with regex I can see your point.
Also in the same topic it would be nice to have character class case insensitive [a-z]i
for grammars where identifiers are case insensitive (SQL, Pascal, ...).
Here is an example on peggy
playground https://peggyjs.org/online.html (also implemented here https://github.com/mingodad/peg):
start = name_char+
name_char =
[a-z0-9$_]i* [ \t\n]
Input:
one
Two
One
@mingodad, thanks for the response. I'll close this issue. Could you make a separate issue for [...]i
operator?
See discussion here and the examples tested on
cpp-peglib
playground.