winnow-rs / winnow

Making parsing a breeze
https://docs.rs/winnow
Other
525 stars 40 forks source link

Location aware error context #409

Open urso opened 9 months ago

urso commented 9 months ago

Please complete the following tasks

winnow version

0.5.31

Describe your use case

Using ContextError in combination with backtracking via alt can lead to confusing error reports. This is because ContextError basically discards the current context on or.

In presence of backtracking ideally I would wish for a context from the longest matching parser only. If 2 parsers did error at the same position the contexts should be merged.

Describe the solution you'd like

Provide a LocationContextError that I can use in case of ContextError. The LocationContextError would track the longest parsed location. This requires that the input implements Location. When merging contexts via AddContext or when adding some context to the error, the LocationContextError should discard contexts at lower positions.

Alternatives, if applicable

Encourage users to implement their own location aware context error. I have 2 implementations by now, but it would be nice to share the code in order to make it more generally available.

Additional Context

No response

epage commented 9 months ago

I feel there is something missing in this idea because of the focus on other location types.

Location is intended to give an absolute position for the current location and isn't sufficient on its own to make a span as you need two locations.

I'm also concerned this couples the ideas of Location tracking in errors and backtracking behavior.

As one alternative, we could have an Error decorator that tracks a Ord Checkpoint and handles the backtracking case.

urso commented 9 months ago

I feel there is something missing in this idea because of the focus on other location types.

Location is intended to give an absolute position for the current location and isn't sufficient on its own to make a span as you need two locations.

Sounds fair. Indeed it was not my intention to add location tracking (although that would might a nice addition as well). Still in presence of backtracking I find the current ContextError lacking.

For example in a parser like this (SQL dialect):

alt((
  parse_create_table,
  parse_create_view,
  parse_drop_table
))

With context error (without splicing in cut_err or using dispatch) the ContextError will focus on parse_drop_table, which leads to confusing error messages.

As one alternative, we could have an Error decorator that tracks a Ord Checkpoint and handles the backtracking case.

I wonder how this will look like.

Related to #410: The Checkpoint already has a constraint on Offset and offset_from gives us an usize that we could use to track the absolute position. I wonder if in this case I really need Stream to be Location. Using checkpoints might gives us enough details to reason about the position.

Still, the goal is not position tracking. The goal is to choose and eventually merge 2 contextual errors.

Back to the SQL example. I'm assuming that we add a context for each keyword to be parsed. Something like this:

fn kw(kw: &'static str) -> impl Parser<....> {
    |input| 
        tag(kw)
            .context(StrContext::Expected(StrContextValue::StringLiteral(kw)))
            .parse_next(input)
}

Let's assume an input string like CREATE SOMETHING. In that case parse_create_table and parse_create_view, both will fail at the same position. The error context of parse_create_table will contain StrContext::Expected(StrContextValue::StringLiteral("TABLE")) and the error context of parse_create_view will contain StrContext::Expected(StrContextValue::StringLiteral("VIEW")).

In that case I want both error contexts to be merged, telling users that we did expect TABLE or VIEW as next symbol after CREATE.

In this sense I'm "abusing" backtracking + error context handling to collect a list of all allowed literals right from the parsers.

I'm new to winnow, so I might be missing something here. But I don't see how an Error decorator will solve the context-merge problem.

epage commented 9 months ago

Sounds fair. Indeed it was not my intention to add location tracking (although that would might a nice addition as well). Still in presence of backtracking I find the current ContextError lacking.

For context (har har), combine does something similar to this and I had sub-par results and it slowed down performance. I found it was much better to have a sane case as the last (example) or to hand-craft things with a fail (example).

This doesn't mean I'm against giving the option for it but its why the design favors the current approach.

The Checkpoint already has a constraint on Offset and offset_from gives us an usize that we could use to track the absolute position. I wonder if in this case I really need Stream to be Location. Using checkpoints might gives us enough details to reason about the position.

Checkpoints require having an initial point to refer back to and atm you can't get offsets between two checkpoints without knowing which one is earlier.

However, if we make them Ord, offset_froms behavior doesn't matter.

In that case I want both error contexts to be merged, telling users that we did expect TABLE or VIEW as next symbol after CREATE.

While we can't do it with the existing infrastructure and a new wrapper type, we could extend AddContext (though we'd not want to break compatibility ... yet) or we could add a new trait (MergeContext?).

urso commented 9 months ago

Checkpoints require having an initial point to refer back to and atm you can't get offsets between two checkpoints without knowing which one is earlier.

However, if we make them Ord, offset_froms behavior doesn't matter.

Yeah. I'm not sure if requiring Checkpoints to implement Ord in general would mean a breaking change. I guess if we just require Ord for some specific traits when merging the contexts we should be fine.

While we can't do it with the existing infrastructure and a new wrapper type, we could extend AddContext (though we'd not want to break compatibility ... yet) or we could add a new trait (MergeContext?).

Yeah, I was thinking about renaming the type in #410 to MergedContextError or LongestContextError and see if a trait would make sense. I'm just not sure if this can easily be done without introducing a breaking change. This is why I opted to introduce an alternative context error type.

For example in ContextError<C> and in my own type the context has type Vec<C>. To me it sounds like we want to change ContextError<C> to become:

struct ContextError<C = Vec<StrContext>> {
  context: C
}

But that would be a breaking change to ContextError itself, right (at least for users with custom context values)? We still would have to introduce another Context error type.

epage commented 9 months ago

I created #413 to demonstrate my idea. Up for finishing the rest?

I feel like the cmp logic probably deserves some end-to-end tests (ie render errors from using alt, Parser::context, etc)

urso commented 9 months ago

Thank you, I gave it a shot: https://github.com/winnow-rs/winnow/pull/415

I like the LongestMatch type + MergeContext much better then my original implementation in #410.

Unfortunately I think Ord is not the right trait. I also added some tests to see if it works as expected. The problem becomes obvious when having partial input. In that case we have a parser that has consumed all input, but did produce an error and a second parser that did fail early. In that case we are comparing "" with "rest". Apparently "rest" > "", which makes us report the wrong context.

In case of &'a str as input SliceOrd will be used. Stepping through with the debugger it looks like this implementation is used.

I don't think we can use Ord as is. Either we introduce another trait to require some ordering based on the parse position (which might require us to use pointer comparison in case of slices) or we require the input to implement Location and fallback to using usize.

epage commented 9 months ago

Unfortunately I think Ord is not the right trait. I also added some tests to see if it works as expected.

This was me being quick and forgetting what we put inside of Checkpoint.

What we need to do is call as_ptr on the content and do Ord on that. This will likely involve adding another trait into the mix (which we can consider cleaning up in the next breaking change). Maybe an AsOrd?