Open urso opened 9 months ago
I feel there is something missing in this idea because of the focus on other location types.
Location
is intended to give an absolute position for the current location and isn't sufficient on its own to make a span as you need two locations.
I'm also concerned this couples the ideas of Location tracking in errors and backtracking behavior.
As one alternative, we could have an Error decorator that tracks a Ord
Checkpoint
and handles the backtracking case.
I feel there is something missing in this idea because of the focus on other location types.
Location is intended to give an absolute position for the current location and isn't sufficient on its own to make a span as you need two locations.
Sounds fair. Indeed it was not my intention to add location tracking (although that would might a nice addition as well). Still in presence of backtracking I find the current ContextError
lacking.
For example in a parser like this (SQL dialect):
alt((
parse_create_table,
parse_create_view,
parse_drop_table
))
With context error (without splicing in cut_err
or using dispatch
) the ContextError
will focus on parse_drop_table
, which leads to confusing error messages.
As one alternative, we could have an Error decorator that tracks a Ord Checkpoint and handles the backtracking case.
I wonder how this will look like.
Related to #410:
The Checkpoint
already has a constraint on Offset
and offset_from
gives us an usize
that we could use to track the absolute position. I wonder if in this case I really need Stream
to be Location
. Using checkpoints might gives us enough details to reason about the position.
Still, the goal is not position tracking. The goal is to choose and eventually merge 2 contextual errors.
Back to the SQL example. I'm assuming that we add a context
for each keyword to be parsed. Something like this:
fn kw(kw: &'static str) -> impl Parser<....> {
|input|
tag(kw)
.context(StrContext::Expected(StrContextValue::StringLiteral(kw)))
.parse_next(input)
}
Let's assume an input string like CREATE SOMETHING
. In that case parse_create_table
and parse_create_view
, both will fail at the same position. The error context of parse_create_table
will contain StrContext::Expected(StrContextValue::StringLiteral("TABLE"))
and the error context of parse_create_view
will contain StrContext::Expected(StrContextValue::StringLiteral("VIEW"))
.
In that case I want both error contexts to be merged, telling users that we did expect TABLE
or VIEW
as next symbol after CREATE
.
In this sense I'm "abusing" backtracking + error context handling to collect a list of all allowed literals right from the parsers.
I'm new to winnow, so I might be missing something here. But I don't see how an Error decorator will solve the context-merge problem.
Sounds fair. Indeed it was not my intention to add location tracking (although that would might a nice addition as well). Still in presence of backtracking I find the current ContextError lacking.
For context (har har), combine
does something similar to this and I had sub-par results and it slowed down performance. I found it was much better to have a sane case as the last (example) or to hand-craft things with a fail
(example).
This doesn't mean I'm against giving the option for it but its why the design favors the current approach.
The Checkpoint already has a constraint on Offset and offset_from gives us an usize that we could use to track the absolute position. I wonder if in this case I really need Stream to be Location. Using checkpoints might gives us enough details to reason about the position.
Checkpoints require having an initial point to refer back to and atm you can't get offsets between two checkpoints without knowing which one is earlier.
However, if we make them Ord
, offset_from
s behavior doesn't matter.
In that case I want both error contexts to be merged, telling users that we did expect TABLE or VIEW as next symbol after CREATE.
While we can't do it with the existing infrastructure and a new wrapper type, we could extend AddContext
(though we'd not want to break compatibility ... yet) or we could add a new trait (MergeContext
?).
Checkpoints require having an initial point to refer back to and atm you can't get offsets between two checkpoints without knowing which one is earlier.
However, if we make them Ord, offset_froms behavior doesn't matter.
Yeah. I'm not sure if requiring Checkpoints to implement Ord
in general would mean a breaking change. I guess if we just require Ord
for some specific traits when merging the contexts we should be fine.
While we can't do it with the existing infrastructure and a new wrapper type, we could extend AddContext (though we'd not want to break compatibility ... yet) or we could add a new trait (MergeContext?).
Yeah, I was thinking about renaming the type in #410 to MergedContextError
or LongestContextError
and see if a trait would make sense. I'm just not sure if this can easily be done without introducing a breaking change. This is why I opted to introduce an alternative context error type.
For example in ContextError<C>
and in my own type the context has type Vec<C>
. To me it sounds like we want to change ContextError<C>
to become:
struct ContextError<C = Vec<StrContext>> {
context: C
}
But that would be a breaking change to ContextError
itself, right (at least for users with custom context values)? We still would have to introduce another Context error type.
I created #413 to demonstrate my idea. Up for finishing the rest?
I feel like the cmp
logic probably deserves some end-to-end tests (ie render errors from using alt
, Parser::context
, etc)
Thank you, I gave it a shot: https://github.com/winnow-rs/winnow/pull/415
I like the LongestMatch
type + MergeContext
much better then my original implementation in #410.
Unfortunately I think Ord
is not the right trait. I also added some tests to see if it works as expected.
The problem becomes obvious when having partial input. In that case we have a parser that has consumed all input, but did produce an error and a second parser that did fail early. In that case we are comparing ""
with "rest"
. Apparently "rest" > ""
, which makes us report the wrong context.
In case of &'a str
as input SliceOrd
will be used. Stepping through with the debugger it looks like this implementation is used.
I don't think we can use Ord
as is. Either we introduce another trait to require some ordering based on the parse position (which might require us to use pointer comparison in case of slices) or we require the input to implement Location and fallback to using usize
.
Unfortunately I think Ord is not the right trait. I also added some tests to see if it works as expected.
This was me being quick and forgetting what we put inside of Checkpoint
.
What we need to do is call as_ptr
on the content and do Ord
on that. This will likely involve adding another trait into the mix (which we can consider cleaning up in the next breaking change). Maybe an AsOrd
?
Please complete the following tasks
winnow version
0.5.31
Describe your use case
Using ContextError in combination with backtracking via
alt
can lead to confusing error reports. This is becauseContextError
basically discards the current context onor
.In presence of backtracking ideally I would wish for a context from the longest matching parser only. If 2 parsers did error at the same position the contexts should be merged.
Describe the solution you'd like
Provide a
LocationContextError
that I can use in case ofContextError
. TheLocationContextError
would track the longest parsed location. This requires that the input implementsLocation
. When merging contexts viaAddContext
or when adding some context to the error, theLocationContextError
should discard contexts at lower positions.Alternatives, if applicable
Encourage users to implement their own location aware context error. I have 2 implementations by now, but it would be nice to share the code in order to make it more generally available.
Additional Context
No response