yhirose / cpp-peglib

A single file C++ header-only PEG (Parsing Expression Grammars) library
MIT License
879 stars 112 forks source link

Whitespace rule can spoil error messages #292

Closed kfsone closed 4 months ago

kfsone commented 5 months ago

I've been trying to improve the quality of errors reported by some of my earlier peglib grammars, but when I simplified the grammars the error messaging changed, eventually I realized that peglib is currently somewhat variable about what it selects to report as expecting.

File <- Compound
Value <- [0-9]+
Field <- [a-z]+

### VVVVV changes here VVVVV ###
%whitespace <- [ \t\r\n]*
#%whitespace <- SPACE*
#~SPACE <- [ \t\r\n]

Compound <- Unit / Array / Object
#Compound <- Array / Object
### ^^^^^ changes here ^^^^^ ###

Unit <- '{' '}'
Array <- '{' Object+ '}'
Object <- '{' ObjectMember+ '}'
ObjectMember <- Field '=' Value ','?

given the input {{1}} or {{a=1,2}} I get various different errors depending on what is commented in/out

Closest to what I expect: image

But if I use a production in whitespace (which I do for comments), suddenly it tells the user it was expecting a space: image

Simple whitespace again but without unit, it now gives me an accurate report, but not the one I was expecting: image (I expected the error to be that it was expecting <Object>)

I suspect something else is also going wrong here, because the %whitespace change doesn't affect this 3rd case: image

but at the same time, I get a low-quality error from the second match case: image (it should be expecting <ObjectMember>, '}')

whereas the original combination gives me excessive expectations: image

and this slight variation (introducing an extra component from the original grammar) confuses me the most:

image

File <- Compound
Value <- [0-9]+
Field <- [a-z]+

%whitespace <- [ \t\r\n]*

Compound <- Unit / Array / Object

Unit <- '{' '}'
Array <- '{' (Unit/Object)+ '}'
Object <- '{' ObjectMember+ '}'
ObjectMember <- Field '=' Value ','?

The first '{' could be either Unit, Array, or Object. the next token, '{' eliminates unit and Object, only array allows '{' next. So surely when it reaches the 1 it is expecting either '}' or ObjectMember->Field?

yhirose commented 5 months ago
image
yhirose commented 5 months ago

@kfsone by the way, could you split this issue to separate issues? It's pretty hard for me to follow...

yhirose commented 5 months ago

One issue at a time. :)

kfsone commented 5 months ago

It's one issue, the extra information is to show what I tried that also didn't work

kfsone commented 5 months ago

I see - I need the <>s around the field/value regex, but the main fix is to prefix the space token with _? I couldn't find a description of _ in the README, what does it do?

yhirose commented 5 months ago

@kfsone, yes, rules which start with _ won't show up in error messages. I forgot to specify it in the README... I'll update the README.

https://github.com/yhirose/cpp-peglib/blob/d9d38ec122f450930bef3c4e7abdb60db4af20ca/peglib.h#L2598-L2599

kfsone commented 4 months ago

Excellent - thank you - I can close this, you've already tackled the issue :)