kach / nearley

📜🔜🌲 Simple, fast, powerful parser toolkit for JavaScript.
https://nearley.js.org
MIT License
3.63k stars 231 forks source link

Error recovery #157

Closed Turbo87 closed 7 years ago

Turbo87 commented 7 years ago

Is it possible to configure the parser/algorithm in some way so that it can recover from syntax errors instead of throwing directly?

bates64 commented 7 years ago

I'd guess a viable solution is to feed each token/character separately and check for errors as you go? Then, if an error is found, you could try clearing the fed tokens and feeding 'fix' characters, I guess.

For example to implement JS-like automatic semicolon insertion, the psuedocode would be:

while <there are no errors>
  feed(next token)
end

feed(semicolon)

if <there are errors>
  # normal error handler
  unfeed(semicolon)
  print("whoops syntax error")
else
  continue while loop
end
Turbo87 commented 7 years ago

to be a bit more concrete: we're working on VSCode support for Ember.js templates and doing code completion in those templates. obviously we'd have to parse the document but if the user is in the middle of typing there might actually be incorrect syntax in some parts of the template.

Example:

<div class="entry">
  <h1>{{title}}</h1>
  <div class="body">
    {{body}}

    {{}}
  </div>
</div>

(see https://astexplorer.net/#/gist/7730e6615bcac03b90dd74f32431abb6/00fdb1e3d4895e9e282a6af7fd6ea8dcfe6d5c12)

I'd love to get the AST of everything except the problematic section of the code and additionally get an array of all the parsing errors that were found. Do you think this is something that could be handled by nearley?

tjvr commented 7 years ago

Do you think this is something that could be handled by nearley?

Error-correcting parsing is a very difficult problem (I've done some research in this area).

nearley has no specific support for this. I don't know of any parser toolkit which does... Marpa has something a little bit like this ("ruby slippers"), but it's Perl and though it sounds similar, I don't think the feature is quite what you want.

@nanalan's suggestion is probably your best bet: pick a parser toolkit (e.g. nearley), and when it throws an error, try and "fix" the situation by looking at the tokens around the error.

Good luck! :)

bd82 commented 7 years ago

I don't know of any parser toolkit which does...

Antlr4 does and has a JavaScript runtime as well. http://www.antlr.org/api/Java/org/antlr/v4/runtime/DefaultErrorStrategy.html

And so does Chevrotain. https://github.com/SAP/chevrotain

suggestion is probably your best bet: pick a parser toolkit (e.g. nearley), and when it throws an error, try and "fix" the situation by looking at the tokens around the error.

I'm afraid I have to disagree.

Having actually implemented error recovery capabilities in a parsing toolkit (in Chevrotain) This is not something I would recommend to try manually implementing on top on an existing toolkit as and end user.

To do so one would need information on the grammar (at runtime) that is probably not exposed by most parsing toolkits and hooks into the parsing toolkit engine.

I am sure a few common errors (e.g. missing semicolons) could be "manually" handled by the average grammar author, but the more general capabilities of fault tolerance and error recovery may be very hard to implement without "help" by the parsing toolkit.

tjvr commented 7 years ago

To do so one would need information on the grammar (at runtime) that is probably not exposed by most parsing toolkits

Actually, Earley parsers expose an awful lot of useful info at runtime.

bd82 commented 7 years ago

Actually, Earley parsers expose an awful lot of useful info at runtime.

Thats good to know.

Is the entire grammar exposed at runtime? Lets say I'm in some position inside rule A. Is there enough information present to identify what possible tokens can follow my current position?

tjvr commented 7 years ago

Is the entire grammar exposed at runtime?

Of course!

bd82 commented 7 years ago

Of course!

Right so it is more likely to be possible to add these generic recovery algorithms to nearley as an end user.

A good reference for that could be "The Definitive ANTLR 4 Reference". (unfortunately costs $$$...)

Under the section: Error Reporting and Recovery --> Automatic Error Recovery Strategy

deltaidea commented 7 years ago

Is it even theoretically possible to implement basic error recovery in nearley? I wonder why this is closed.