j-mie6 / parsley

A fast and modern parser combinator library for Scala
https://j-mie6.github.io/parsley/
BSD 3-Clause "New" or "Revised" License
173 stars 15 forks source link

Multiple Error Messages #139

Open j-mie6 opened 1 year ago

j-mie6 commented 1 year ago

Is your feature request related to a problem? Please describe. Occasionally, after having generated one syntax error, it may be possible to resume the parse and generate further syntax errors. As these go on, they may become less and less useful, but this is up for the end-user to appreciate for themselves.

Describe the solution you'd like It would be nice if Parsley could produce multiple syntax errors from a parser. To accomplish this, recovery points need to be added into parsing flow, to indicate where a catastrophically failed parser could resume from. Such recovery points are places in the grammar that the parse can resume, and must know first predicate of that grammar position. This can be done via a combinator, which marks a specific parser as a continuation point, and tells it the set of characters it may resume from: these are provided explicitly. When a failure occurs, the parse will try and resume afresh from the next recovery point, dropping input until one of the provided character set is reached.

Describe alternatives you've considered It is possible to establish the first predicate of a parser via static analysis: this, however, would be prohibitively expensive, so a user-directed approach is favourable.

Additional context Recovery combinators either need to provide dummy values to allow for the parser to continue to construct values, or need to force the parser to run in some cheaper error generating only mode. The former is easier but requires more user annotation, the latter requires extra evaluation semantics for each instruction, which may be annoying to implement.

j-mie6 commented 1 year ago

Could do both actually: partial results might be desirable to continue working, whereas multiple errors on their own is also good.

Would require two extra run loops anyway, because we don't want to ruin the tight loop in the real run function. The major change here is just how these should be reflected in the Result type... Perhaps we just make two more result types for now, and Success can inhabit all three? Then Partial and MultiFail constructors alongside?

This might make it non major...

kubukoz commented 1 month ago

I would very much love to see a nice parser combinator library with error recovery 🥺

Would it be in scope to produce multiple errors AND a result value? Thinking in the context of a language server.

j-mie6 commented 1 month ago

So we actually had a Masters project by @DanKirwan that implemented this this year (way harder than we anticipated!).

It was able to generate a final result as well as multiple error messages with a manual combinator that can specify how errors might be recovered at a certain point in the parse, with a dummy value to fill for that.

In fact, Dan turned an existing parser+compiler written with parsley into a language server for vscode compiling it to scala-js with the new combinators.

kubukoz commented 1 month ago

Very exciting! Is this available to view somewhere online? :)

j-mie6 commented 1 month ago

The language server in question no, that was technically an adapted coursework solution.

kubukoz commented 1 month ago

right, thought as much... can you tell me more about the combinator for recovery, though?

j-mie6 commented 1 month ago

Sure, The combinator Dan introduced basically has this shape:

extension [A](p: Parsley[A]) def recoverWith[B >: A](hdl: Parsley[B]): Parsley[B]

The idea is that you run p normally, it potentially fails, and you will travel out of the recoverWith combinator without executing hdl. If unwinding of the regular error context eventually results in a fatal error, where no backtracking or regular recovery was performed, you then return to the state of the world when hdl was crossed and execute it from the point p failed (this turns out to have been really rather difficult to achieve!). The hdl parser's job is to consume input until some safe place and then return the "dummy" result that will take p's place in the wider parser. The aim overall is to perform recovery from the parser that failed deepest.

This combinator is pretty general, but it can be used to make a bunch of handy recovery strategies and patterns:

extension [A](p: Parsley[A])
    def panic[B >: A](FOLLOW: Parsley[?], dummy: B) = p.recoverWith(many(item - FOLLOW).as(dummy)) 
    ...

The panic combinator here is a "panic-mode recovery", which drops characters until a known valid following character is found. This could be used something like stmt.panic(';', BrokenStmt): if stmt fatally fails, you continue to read characters until you hit the next ;, then return a BrokenStmt and continue parsing from there (having not consumed the ;). This was broadly the strategy he used in the parser adapted into the language server.