Open j-mie6 opened 1 year ago
Could do both actually: partial results might be desirable to continue working, whereas multiple errors on their own is also good.
Would require two extra run loops anyway, because we don't want to ruin the tight loop in the real run function. The major change here is just how these should be reflected in the Result
type... Perhaps we just make two more result types for now, and Success
can inhabit all three? Then Partial
and MultiFail
constructors alongside?
This might make it non major...
I would very much love to see a nice parser combinator library with error recovery 🥺
Would it be in scope to produce multiple errors AND a result value? Thinking in the context of a language server.
So we actually had a Masters project by @DanKirwan that implemented this this year (way harder than we anticipated!).
It was able to generate a final result as well as multiple error messages with a manual combinator that can specify how errors might be recovered at a certain point in the parse, with a dummy value to fill for that.
In fact, Dan turned an existing parser+compiler written with parsley into a language server for vscode compiling it to scala-js with the new combinators.
Very exciting! Is this available to view somewhere online? :)
The language server in question no, that was technically an adapted coursework solution.
right, thought as much... can you tell me more about the combinator for recovery, though?
Sure, The combinator Dan introduced basically has this shape:
extension [A](p: Parsley[A]) def recoverWith[B >: A](hdl: Parsley[B]): Parsley[B]
The idea is that you run p
normally, it potentially fails, and you will travel out of the recoverWith
combinator without executing hdl
. If unwinding of the regular error context eventually results in a fatal error, where no backtracking or regular recovery was performed, you then return to the state of the world when hdl
was crossed and execute it from the point p
failed (this turns out to have been really rather difficult to achieve!). The hdl
parser's job is to consume input until some safe place and then return the "dummy" result that will take p
's place in the wider parser. The aim overall is to perform recovery from the parser that failed deepest.
This combinator is pretty general, but it can be used to make a bunch of handy recovery strategies and patterns:
extension [A](p: Parsley[A])
def panic[B >: A](FOLLOW: Parsley[?], dummy: B) = p.recoverWith(many(item - FOLLOW).as(dummy))
...
The panic
combinator here is a "panic-mode recovery", which drops characters until a known valid following character is found. This could be used something like stmt.panic(';', BrokenStmt)
: if stmt
fatally fails, you continue to read characters until you hit the next ;
, then return a BrokenStmt
and continue parsing from there (having not consumed the ;
). This was broadly the strategy he used in the parser adapted into the language server.
Is your feature request related to a problem? Please describe. Occasionally, after having generated one syntax error, it may be possible to resume the parse and generate further syntax errors. As these go on, they may become less and less useful, but this is up for the end-user to appreciate for themselves.
Describe the solution you'd like It would be nice if Parsley could produce multiple syntax errors from a parser. To accomplish this, recovery points need to be added into parsing flow, to indicate where a catastrophically failed parser could resume from. Such recovery points are places in the grammar that the parse can resume, and must know first predicate of that grammar position. This can be done via a combinator, which marks a specific parser as a continuation point, and tells it the set of characters it may resume from: these are provided explicitly. When a failure occurs, the parse will try and resume afresh from the next recovery point, dropping input until one of the provided character set is reached.
Describe alternatives you've considered It is possible to establish the first predicate of a parser via static analysis: this, however, would be prohibitively expensive, so a user-directed approach is favourable.
Additional context Recovery combinators either need to provide dummy values to allow for the parser to continue to construct values, or need to force the parser to run in some cheaper error generating only mode. The former is easier but requires more user annotation, the latter requires extra evaluation semantics for each instruction, which may be annoying to implement.