Open markw65 opened 8 months ago
The problem is not so simple and reporting expected from predicates can create many false-positives. This need to be carefully checked. In your examples the symbol is checked on the first position, but them actually can be deeper. Even in the test that you changed the new behaviour do report actually impossible input:
start = 'a' / &'b' / 'c'
That grammar does not accept b
. The second alternative actually cannot match anything because it is contradictory.
Expectations from predicates are difficult and from negative predicates are especially difficult. Probably the current behavior the best balance between accurate and complete reporting.
That grammar does not accept b. The second alternative actually cannot match anything because it is contradictory
That's true, but you don't take account of what might happen later in the grammar anywhere else. eg
start = ("a" / "b" / "c") []
Peggy reports '[a-c]' expected, even though the grammar doesn't match anything, in much the same way.
So I think it makes sense to report that "a" or "b" or "c"
was expected in the 'a' / &'b' / 'c'
- because in any real grammar there's going to be a follow up. eg
start = ('a' / &'b' / 'c') .?
Now saying that it expects 'a' or 'c' is definitely wrong - it really does accept 'a', 'b', or 'c'.
Its probably just a failure of imagination on my part, but I can't think of a case where its wrong to include the predicate failure... can you provide examples?
[edit: I guess in the negative cases, its going to be wrong because it will report that it was expecting something, when in fact it was expecting anything but something - so I can see that that part of the pull request is wrong. Not sure yet if its fixable]
That's true, but you don't take account of what might happen later in the grammar anywhere else. eg
It still makes sense to report expectations in this case, since the first character really has to be a
, b
or c
. The error will be reported only at the second character, and actually such grammar even shouldn't be compiled. It is better to move in this direction -- detect contradictory grammars at compile time rather that fixing error reporting for them. This situation is the same as here:
start = 'a' / 'b' 'c'
This grammar accepts only a
and bc
, but if you enter, say, f
, the error would be Expected "a" or "b" but "f" found.
even when just b
is not accepted input for the whole grammar.
Now saying that it expects 'a' or 'c' is definitely wrong - it really does accept 'a', 'b', or 'c'.
At least it does not say anything that wouldn't be true. Some valid inputs are missed in the message, but the message never contains invalid inputs. I think, that this property is more important. Believe, I tried to implement better error handling taking expectations from predicates and even created negative expectations, but I remember that in real cases that does not look well. Although I never tried to work only with positive predicates.
I'm still having a hard time understanding your argument here.
It still makes sense to report expectations in this case, since the first character really has to be a, b or c.
Which is also the case for 'a' / &'b' / 'c'
. The first character has to be a, b or c, and whether the rule matches depends on how the rule is embedded in the grammar (which isn't checked in either case).
The error will be reported only at the second character
No, it will be reported whether or not there's a second character.
You seem to be arguing that because the grammar might be self contradictory, we shouldn't report the &'b'
because it might be impossible to match when providing a 'b'. But you're also arguing that we should reject self contradictory grammars - and if you were able to do that, then there would be even more reason to report the &'b'
, since now we know that there's a valid match somewhere.
At least it does not say anything that wouldn't be true
You still haven't provided an example where reporting (positive) matches would do that though...
As noted above, its definitely not right for negative predicates.
If a predicate fails, its failure needs to be recorded:
Before:
Note the first one doesn't have a failure explanation at all, and the second says that 'c' was expected, when 'b' or 'c' would have been acceptable.
After:
There were tests for the second case, but they were testing the actual behavior, rather than what (I think) it should be doing.