janestreet / parsexp

S-expression parsing library
MIT License
34 stars 9 forks source link

Error recovery when handling bad inputs #3

Open cpitclaudel opened 5 years ago

cpitclaudel commented 5 years ago

Hi there,

Thanks for this cool library. I'm using it to parse programs written in a lisp-like language (the stack goes like this: read sexps → parse into an AST → resolve names → typecheck).

I've added error recovery mechanisms to all phases except the first one. This way I can report multiple input errors instead of stopping at the first one, but only as long as the input is well-parenthesized.

Now I'd like to make the sexp-reading phase a bit more robust. I've already changed to an eager parser, which gives me a nice way to parse, resolve, and typecheck all top-level sexps until the first parse error. I'd like to do more, by handling the following cases:

I've got the first one to work by skipping characters every time I get a parse error, but number 2 and 3 seem tricky to do without access to the contents of the parser's stack. A I going about this the right way? Is there an easier approach that I missed?

Thanks!

aalekseyev commented 4 years ago

We are working on a related project where we use parsexp to parse a sexp prefix so that we can propose sexp completions in an editor.

The way we're approaching it is we expose the parser stack type and work with it directly.

In addition to the stack, we sometimes need to inspect the parser state. This is stored in the automaton_state field in Automaton_state.t. We use the library parsexp_symbolic_automaton to help interpret those values.

Sorry I can't offer anything higher-level at this point, but if you have any ideas then I'm curious.

aalekseyev commented 4 years ago

By the way, your third bullet point is actually not error handling, from the Parsexp point of view. You're taking a previously valid s-expressions, for example:

("foo
 "bar
)

and changing their interpretation.

Parsexp parses it as ("foo\n " "bar"), but you presumably want it to parse as (foo bar).

This means that you need to inspect and potentially modify parsexp parser state after every newline character, not just on every error.

aalekseyev commented 4 years ago

It was brought to my attention that I may not have made this clear above.

cc @dwang20151005

We have an ongoing project internally that uses parsexp to do partial parses. When this project takes shape we can see what public API makes sense to expose and we'll keep this use case in mind as something that would be useful to support.