Closed paulp closed 9 years ago
Yeah, I did something exactly along these lines last week.
There is now an ErrorFormatter
abstraction which encapsulates all the error formatting logic and makes it easy to replaces parts of it with custom stuff.
I also refactored the error analysis logic into proper separation of phases which are now modelled by an ErrorAnalysisPhase
abstraction which roughly corresponds to your ParserState
idea.
The goal is to have the parser produce proper error messages without the need add any helpers whatsoever in the grammar (like cut
markers). I hope that by the end of this week I'll this bit completed.
My refactorings are now in master. I've also published a new 2.1.0-SNAPSHOT build to sonatype.
With these latest improvements to the pb2 parse error analysis logic I think it should now be possible to shape the grammar into a form that results in good error messages for parse errors, without the need to add any "cut" markers to the grammar.
Assuming we have a good "raw" pb2 grammar for Scala the error analysis logic will produce parse errors that
The reason for (1) is that the parser has no way to know by itself that certain lexical-level elements should not generate traces. For example it is not helpful for a user to be informed that instead of an illegal character the grammar also allows whitespace or a block comment. Whitespace and comments can appear pretty much anywhere and should therefore not be reported as "expected".
The way we fix this problem is by wrapping certain rules with the new quiet
helper.
Mismatches in quiet rules do not generate traces and do not produce "expected" elements unless the grammar does not allow any non-quiet rule to reach the error character. (For example, once you are in a block comment the parser will report a missing */
closing marker.)
We shouldn't need too many quiet
markers in the Scala grammar, a few at the lowest-level (WL
, WS
, CommentLine
, NewLine
, and maybe a few more) should do.
The fix for (2) consists of marking the "right" rules as atomic
, which prevents the parser form reporting "expected" elements below the marked rule. The previous implementation of atomic
was still buggy, it should now do the right thing.
The task now is to find the right level in our Scala grammar rule tree to apply atomic
markers to. Once we have gotten this right the produced error messages should be "correct", i.e. relevant and at the right level of detail.
@paulp As you appear to have the progressed quite significantly beyond @lihaoyi's work (which was the base for the scalaparser example in the pb2 repo) I will suspend my work on the example and simply switch over to your fork once you consider it "done enough for now".
Looks like @paulp hasn't touched his parser on github in 5 days, which is the longest uninterrupted stretch of not-touched-ness ever since he forked it from mine. I guess that means it's "done"?
Is that what five days means? I'm working on something else.
I'm participating in the marathon, not the sprint.
I am not sure that my repository is likely to be very usable in any case from the perspective of this project.
What's the problem with the usability of your fork? Wouldn't your grammar fixes also apply to a general Scala parser that is not specialized to your particular downstream processing needs?
I have dependencies you surely won't want, and I'm parsing a significant superset of scala without any near-term designs on being bug-compatible with it.
For instance I am parsing traits, objects, and classes with the same rule. That means nothing in the parser stops you from writing a "case trait" with multiple type parameter lists and so on.
Ok, I see. You've have relaxed things quite a bit. Of course we'll have to find a trade-off between putting weight on the parser and putting weight on subsequent stages as well, but we might well come to a different conclusion. Thanks anyway!
@sirthias any thoughts on publishing a new stable version of parboiled? I'd like to start being able to publish libraries which depend on the parser, and having to make everyone resolve snapshots blocks that
I'd like to make sure that we have a good state of the Scala parser first, with the major issues in error reporting etc. resolved. If I publish 2.1.0 now and we discover that we need to change things around a bit more then I'd have to go to 2.2.0 for breaking changes.
Ok makes sense
On Fri, Dec 19, 2014 at 2:23 AM, Mathias notifications@github.com wrote:
I'd like to make sure that we have a good state of the Scala parser first, with the major issues in error reporting etc. resolved. If I publish 2.1.0 now and we discover that we need to change things around a bit more then I'd have to go to 2.2.0 for breaking changes.
— Reply to this email directly or view it on GitHub https://github.com/sirthias/parboiled2/issues/121#issuecomment-67621231.
It would be wonderful from a not-having-to-fork standpoint if the footprint of Parser could be reduced. There are at least two apparently straightforward encapsulations available: All the state manipulation should be manipulating an instance of ParserState which contains all that state, and the formatting methods should go into some kind of error message display class because they're super-specific to one notion of error display.
Here is a strawman set of interfaces, only intended to be broadly suggestive - basically reverse engineered from the current implementation without addition ambition. Structuring along these lines could be done transparently for anyone for whom these matters aren't issues while greatly increasing the appeal to those of us who often can't use libraries off-the-shelf.