Show character-by-character verbose output

kschiess / parslet

A small PEG based parser library. See the Hacking page in the Wiki as well.

kschiess.github.com/parslet

MIT License

809 stars 95 forks source link

Show character-by-character verbose output #116

Closed aardvarkk closed 7 years ago

aardvarkk commented 10 years ago

I'm brand new to writing parsers, and I'm having problems with every parser I approach. I've tried Treetop and Citrus and was hoping I'd hit gold with Parslet because of the focus on error reporting. The problem is, it's still not good enough for me! What I'm seeing as error output is a giant tree saying things like:

 `- Failed to match sequence (COMMAND_AND_COMMENT NEWLINE) at line 2 char 113.
    `- Failed to match sequence (COMMAND_DETAIL COMMENT_TO_EOL) at line 2 char 113.
       `- Expected one of [ASSIGNMENT, EXPR] at line 2 char 113.
          |- Failed to match sequence (IDENTIFIER WS{0, } ':=' WS{0, } EXPR) at line 2 char 113.
          |  `- Failed to match sequence (WS{0, } var:([a-zA-Z0-9]{1, }) WS{0, }) at line 2 char 113.
          |     `- Expected at least 1 of [a-zA-Z0-9] at line 2 char 113.
          |        `- Failed to match [a-zA-Z0-9] at line 2 char 113.

... and so on. Everything seems to be complaining about the same character. But how did I get to this character with these expectations? I'm really looking for a line-by-line decision output saying something like "Looking at line 1 character 1. Match 'a' fails so parsing as 'b'." That kind of output would really help me figure out where it is my logic is failing. As it stands, the output doesn't seem overly helpful to me to identify exactly where things are going wrong.

aardvarkk commented 10 years ago

I think it also makes sense to show something like a "partial tree" of what's been found so far so I can tell where the error started initially. Perhaps this would be the simplest and most straightforward to begin with. All I get is a 'nil' output if the parsing fails at all, but wouldn't it be possible to generate some kind of tree representing what has been found "so far" at the point of failure?

kschiess commented 10 years ago

I feel for you. Only that I don't have an answer right now, but what you're hinting at is definitely a direction for parslet to go. I am letting this stand as a feature request, ok?

rubydesign commented 10 years ago

I'm also beginning and had a lot of that. If you put the parser and the grammar on the mailing list one could see. I often got these things when there was unconsumed input after the grammar had finished (from my point of view).

dragostis commented 9 years ago

@kschiess There is a very simple case where the error reporting is not doing so well.

class Mine < Parslet::Parser
  root(:enclosed)

  rule(:enclosed) { str('(') >> number >> str(')') }
  rule(:number) { str('3') >> str('.').maybe }
end

If you parse "(3f)", you get:

Failed to match sequence ('(' NUMBER ')') at line 1 char 3.
`- Expected ")", but got "f" at line 1 char 3.

This seems normal because number gets parsed and enclosed continues and fails, but the tree should also contain the fact that the str('.').maybe failed. If I had this in the tree, it would be fairly easy to show a decent error like Expected "." ....

Like this, if you have an enclosed kind of language, pretty much the only sensible output you're going to get is that you should have ended the clause sooner, no matter what kind of erroneous input you're going to feed inside of the enclosed rule.

kschiess commented 9 years ago

@dragostis This is by design. If you tag an atom as .maybe, by definition, it's not a problem if that goes missing. So the parse succeeds without it; it only later fails because the closing parens is not in the right spot.

I guess the deepest parse error reporter (@ https://github.com/kschiess/parslet/blob/master/example/deepest_errors.rb) would report differently on this.

Philosophically, there's never just one error tree - but a whole forest of them. The failing maybe is in a different tree from the one that's being reported as the last parse failure. No one wants to look at all the different things a parser tries before giving up... So to "better" report on this (for some definition of better) is really hard. Especially because better tends to be defined anecdotally.

kschiess commented 9 years ago

If anyone wants to contribute PRs that extend existing reporting, I would welcome the effort and help it along.

dragostis commented 9 years ago

@kschiess Thanks a lot for the fast reply. For some reason, using the Deepest error reporting produced the same type of results, but it looks like I've been missing something. The Deepest one gives much better results.

I'm currently finishing the grammar for a language and I'll give the error reporting a better look and see if I can contribute anything. Great job with the project! :+1:

kschiess commented 8 years ago

Thank you ;)