sirthias / parboiled2

A macro-based PEG parser generator for Scala 2.10+
Other
717 stars 86 forks source link

Parser defines more than is necessary or appropriate #121

Closed paulp closed 9 years ago

paulp commented 9 years ago

It would be wonderful from a not-having-to-fork standpoint if the footprint of Parser could be reduced. There are at least two apparently straightforward encapsulations available: All the state manipulation should be manipulating an instance of ParserState which contains all that state, and the formatting methods should go into some kind of error message display class because they're super-specific to one notion of error display.

Here is a strawman set of interfaces, only intended to be broadly suggestive - basically reverse engineered from the current implementation without addition ambition. Structuring along these lines could be done transparently for anyone for whom these matters aren't issues while greatly increasing the appeal to those of us who often can't use libraries off-the-shelf.

trait ParserConfig {
  def initialValueStackSize: Int
  def maxValueStackSize: Int
  def errorTraceCollectionLimit: Int
}
trait Parser {
  def config: ParserConfig
  def formatter: ParserShow
  def input: ParserInput
  def state: ParserState
}
trait ParserShowConfig {
  def showExpected: Boolean
  def showPosition: Boolean
  def showLine: Boolean
  def showTraces: Boolean
}
trait ParserShow {
  def formatError(error: ParseError, config: ParserShowConfig): String
  def formatErrorProblem(error: ParseError): String
  def formatErrorLine(error: ParseError): String
}
trait ParserState {
  def cursorChar: Char
  def cursor: Int
  def valueStack: ValueStack
  def maxCursor: Int
  def mismatchesAtErrorCursor: Int
  def currentErrorRuleStackIx: Int
}
sirthias commented 9 years ago

Yeah, I did something exactly along these lines last week. There is now an ErrorFormatter abstraction which encapsulates all the error formatting logic and makes it easy to replaces parts of it with custom stuff.

I also refactored the error analysis logic into proper separation of phases which are now modelled by an ErrorAnalysisPhase abstraction which roughly corresponds to your ParserState idea.

The goal is to have the parser produce proper error messages without the need add any helpers whatsoever in the grammar (like cut markers). I hope that by the end of this week I'll this bit completed.

sirthias commented 9 years ago

My refactorings are now in master. I've also published a new 2.1.0-SNAPSHOT build to sonatype.

With these latest improvements to the pb2 parse error analysis logic I think it should now be possible to shape the grammar into a form that results in good error messages for parse errors, without the need to add any "cut" markers to the grammar.

Assuming we have a good "raw" pb2 grammar for Scala the error analysis logic will produce parse errors that

  1. contain too many traces
  2. report "expected" elements at too deep a level

The reason for (1) is that the parser has no way to know by itself that certain lexical-level elements should not generate traces. For example it is not helpful for a user to be informed that instead of an illegal character the grammar also allows whitespace or a block comment. Whitespace and comments can appear pretty much anywhere and should therefore not be reported as "expected". The way we fix this problem is by wrapping certain rules with the new quiet helper. Mismatches in quiet rules do not generate traces and do not produce "expected" elements unless the grammar does not allow any non-quiet rule to reach the error character. (For example, once you are in a block comment the parser will report a missing */ closing marker.) We shouldn't need too many quiet markers in the Scala grammar, a few at the lowest-level (WL, WS, CommentLine, NewLine, and maybe a few more) should do.

The fix for (2) consists of marking the "right" rules as atomic, which prevents the parser form reporting "expected" elements below the marked rule. The previous implementation of atomic was still buggy, it should now do the right thing. The task now is to find the right level in our Scala grammar rule tree to apply atomic markers to. Once we have gotten this right the produced error messages should be "correct", i.e. relevant and at the right level of detail.

@paulp As you appear to have the progressed quite significantly beyond @lihaoyi's work (which was the base for the scalaparser example in the pb2 repo) I will suspend my work on the example and simply switch over to your fork once you consider it "done enough for now".

lihaoyi commented 9 years ago

Looks like @paulp hasn't touched his parser on github in 5 days, which is the longest uninterrupted stretch of not-touched-ness ever since he forked it from mine. I guess that means it's "done"?

paulp commented 9 years ago

Is that what five days means? I'm working on something else.

I'm participating in the marathon, not the sprint.

I am not sure that my repository is likely to be very usable in any case from the perspective of this project.

sirthias commented 9 years ago

What's the problem with the usability of your fork? Wouldn't your grammar fixes also apply to a general Scala parser that is not specialized to your particular downstream processing needs?

paulp commented 9 years ago

I have dependencies you surely won't want, and I'm parsing a significant superset of scala without any near-term designs on being bug-compatible with it.

For instance I am parsing traits, objects, and classes with the same rule. That means nothing in the parser stops you from writing a "case trait" with multiple type parameter lists and so on.

sirthias commented 9 years ago

Ok, I see. You've have relaxed things quite a bit. Of course we'll have to find a trade-off between putting weight on the parser and putting weight on subsequent stages as well, but we might well come to a different conclusion. Thanks anyway!

lihaoyi commented 9 years ago

@sirthias any thoughts on publishing a new stable version of parboiled? I'd like to start being able to publish libraries which depend on the parser, and having to make everyone resolve snapshots blocks that

sirthias commented 9 years ago

I'd like to make sure that we have a good state of the Scala parser first, with the major issues in error reporting etc. resolved. If I publish 2.1.0 now and we discover that we need to change things around a bit more then I'd have to go to 2.2.0 for breaking changes.

lihaoyi commented 9 years ago

Ok makes sense

On Fri, Dec 19, 2014 at 2:23 AM, Mathias notifications@github.com wrote:

I'd like to make sure that we have a good state of the Scala parser first, with the major issues in error reporting etc. resolved. If I publish 2.1.0 now and we discover that we need to change things around a bit more then I'd have to go to 2.2.0 for breaking changes.

— Reply to this email directly or view it on GitHub https://github.com/sirthias/parboiled2/issues/121#issuecomment-67621231.