Closed Turbo87 closed 7 years ago
I'd guess a viable solution is to feed each token/character separately and check for errors as you go? Then, if an error is found, you could try clearing the fed tokens and feeding 'fix' characters, I guess.
For example to implement JS-like automatic semicolon insertion, the psuedocode would be:
while <there are no errors>
feed(next token)
end
feed(semicolon)
if <there are errors>
# normal error handler
unfeed(semicolon)
print("whoops syntax error")
else
continue while loop
end
to be a bit more concrete: we're working on VSCode support for Ember.js templates and doing code completion in those templates. obviously we'd have to parse the document but if the user is in the middle of typing there might actually be incorrect syntax in some parts of the template.
Example:
<div class="entry">
<h1>{{title}}</h1>
<div class="body">
{{body}}
{{}}
</div>
</div>
I'd love to get the AST of everything except the problematic section of the code and additionally get an array of all the parsing errors that were found. Do you think this is something that could be handled by nearley?
Do you think this is something that could be handled by nearley?
Error-correcting parsing is a very difficult problem (I've done some research in this area).
nearley has no specific support for this. I don't know of any parser toolkit which does... Marpa has something a little bit like this ("ruby slippers"), but it's Perl and though it sounds similar, I don't think the feature is quite what you want.
@nanalan's suggestion is probably your best bet: pick a parser toolkit (e.g. nearley), and when it throws an error, try and "fix" the situation by looking at the tokens around the error.
Good luck! :)
I don't know of any parser toolkit which does...
Antlr4 does and has a JavaScript runtime as well. http://www.antlr.org/api/Java/org/antlr/v4/runtime/DefaultErrorStrategy.html
And so does Chevrotain. https://github.com/SAP/chevrotain
suggestion is probably your best bet: pick a parser toolkit (e.g. nearley), and when it throws an error, try and "fix" the situation by looking at the tokens around the error.
I'm afraid I have to disagree.
Having actually implemented error recovery capabilities in a parsing toolkit (in Chevrotain) This is not something I would recommend to try manually implementing on top on an existing toolkit as and end user.
To do so one would need information on the grammar (at runtime) that is probably not exposed by most parsing toolkits and hooks into the parsing toolkit engine.
I am sure a few common errors (e.g. missing semicolons) could be "manually" handled by the average grammar author, but the more general capabilities of fault tolerance and error recovery may be very hard to implement without "help" by the parsing toolkit.
To do so one would need information on the grammar (at runtime) that is probably not exposed by most parsing toolkits
Actually, Earley parsers expose an awful lot of useful info at runtime.
Actually, Earley parsers expose an awful lot of useful info at runtime.
Thats good to know.
Is the entire grammar exposed at runtime? Lets say I'm in some position inside rule A. Is there enough information present to identify what possible tokens can follow my current position?
Is the entire grammar exposed at runtime?
Of course!
Of course!
Right so it is more likely to be possible to add these generic recovery algorithms to nearley as an end user.
A good reference for that could be "The Definitive ANTLR 4 Reference". (unfortunately costs $$$...)
Under the section: Error Reporting and Recovery --> Automatic Error Recovery Strategy
Is it even theoretically possible to implement basic error recovery in nearley? I wonder why this is closed.
Is it possible to configure the parser/algorithm in some way so that it can recover from syntax errors instead of throwing directly?