peggyjs / peggy

Peggy: Parser generator for JavaScript
https://peggyjs.org/
MIT License
906 stars 64 forks source link

Predicates shouldn't be wrapped in SILENT_FAILS #455

Open markw65 opened 8 months ago

markw65 commented 8 months ago

If a predicate fails, its failure needs to be recorded:

Before:

% echo 'start = &[b] "bb"' | node bin/peggy.js -t "a" 
Error running test
Error: Expected , or undefined but "a" found.
 --> command line:1:1
  |
1 | a
  | ^

% echo 'start = &[b] "bb" / "c"' | node bin/peggy.js -t "a"
Error running test
Error: Expected "c" but "a" found.
 --> command line:1:1
  |
1 | a
  | ^

Note the first one doesn't have a failure explanation at all, and the second says that 'c' was expected, when 'b' or 'c' would have been acceptable.

After:

% echo 'start = &[b] "bb"' | node bin/peggy.js -t "a"      
Error running test
Error: Expected [b] but "a" found.
 --> command line:1:1
  |
1 | a
  | ^
mwilliams@MacBook-Pro peggy % echo 'start = &[b] "bb" / "c"' | node bin/peggy.js -t "a"
Error running test
Error: Expected "c" or [b] but "a" found.
 --> command line:1:1
  |
1 | a
  | ^

There were tests for the second case, but they were testing the actual behavior, rather than what (I think) it should be doing.

Mingun commented 8 months ago

The problem is not so simple and reporting expected from predicates can create many false-positives. This need to be carefully checked. In your examples the symbol is checked on the first position, but them actually can be deeper. Even in the test that you changed the new behaviour do report actually impossible input:

start = 'a' / &'b' / 'c'

That grammar does not accept b. The second alternative actually cannot match anything because it is contradictory.

Expectations from predicates are difficult and from negative predicates are especially difficult. Probably the current behavior the best balance between accurate and complete reporting.

markw65 commented 8 months ago

That grammar does not accept b. The second alternative actually cannot match anything because it is contradictory

That's true, but you don't take account of what might happen later in the grammar anywhere else. eg

start = ("a" / "b" / "c") []

Peggy reports '[a-c]' expected, even though the grammar doesn't match anything, in much the same way.

So I think it makes sense to report that "a" or "b" or "c" was expected in the 'a' / &'b' / 'c' - because in any real grammar there's going to be a follow up. eg

start = ('a' / &'b' / 'c') .?

Now saying that it expects 'a' or 'c' is definitely wrong - it really does accept 'a', 'b', or 'c'.

Its probably just a failure of imagination on my part, but I can't think of a case where its wrong to include the predicate failure... can you provide examples?

[edit: I guess in the negative cases, its going to be wrong because it will report that it was expecting something, when in fact it was expecting anything but something - so I can see that that part of the pull request is wrong. Not sure yet if its fixable]

Mingun commented 8 months ago

That's true, but you don't take account of what might happen later in the grammar anywhere else. eg

It still makes sense to report expectations in this case, since the first character really has to be a, b or c. The error will be reported only at the second character, and actually such grammar even shouldn't be compiled. It is better to move in this direction -- detect contradictory grammars at compile time rather that fixing error reporting for them. This situation is the same as here:

start = 'a' / 'b' 'c'

This grammar accepts only a and bc, but if you enter, say, f, the error would be Expected "a" or "b" but "f" found. even when just b is not accepted input for the whole grammar.

Now saying that it expects 'a' or 'c' is definitely wrong - it really does accept 'a', 'b', or 'c'.

At least it does not say anything that wouldn't be true. Some valid inputs are missed in the message, but the message never contains invalid inputs. I think, that this property is more important. Believe, I tried to implement better error handling taking expectations from predicates and even created negative expectations, but I remember that in real cases that does not look well. Although I never tried to work only with positive predicates.

markw65 commented 8 months ago

I'm still having a hard time understanding your argument here.

It still makes sense to report expectations in this case, since the first character really has to be a, b or c.

Which is also the case for 'a' / &'b' / 'c'. The first character has to be a, b or c, and whether the rule matches depends on how the rule is embedded in the grammar (which isn't checked in either case).

The error will be reported only at the second character

No, it will be reported whether or not there's a second character.

You seem to be arguing that because the grammar might be self contradictory, we shouldn't report the &'b' because it might be impossible to match when providing a 'b'. But you're also arguing that we should reject self contradictory grammars - and if you were able to do that, then there would be even more reason to report the &'b', since now we know that there's a valid match somewhere.

At least it does not say anything that wouldn't be true

You still haven't provided an example where reporting (positive) matches would do that though...

markw65 commented 8 months ago

As noted above, its definitely not right for negative predicates.