Closed baoyachi closed 8 months ago
I've added parse_next
s to help demonstrate what is happening
#!/usr/bin/env nargo
---
[dependencies]
winnow.path = "../winnow"
---
use winnow::ascii::{alpha1, digit1, multispace0};
use winnow::combinator::{delimited, opt, repeat};
use winnow::{PResult, Parser};
fn main() {
let input1 = r#"(123,456,789)
(000,111,22~~~2)
"#;
let offset1 = parse_digits.parse(input1).err().unwrap().offset();
dbg!(offset1); // offset position error
let mut input1 = r#"(123,456,789)
(000,111,22~~~2)
"#;
parse_digits.parse_next(&mut input1);
dbg!(input1);
let input2 = r#"(123,456,789)
(000,1~~~11,222)
"#;
let offset2 = parse_digits.parse(input2).err().unwrap().offset();
dbg!(offset2); // offset position error
let mut input2 = r#"(123,456,789)
(000,1~~~11,222)
"#;
parse_digits.parse_next(&mut input2);
dbg!(input2);
assert_eq!(offset1, offset2);
}
fn parse_digits<'a>(input: &mut &'a str) -> PResult<Vec<Vec<&'a str>>> {
delimited(multispace0, repeat(1.., digit_group), multispace0).parse_next(input)
}
fn digit_group<'a>(input: &mut &'a str) -> PResult<Vec<&'a str>> {
delimited(
(multispace0, '(', multispace0),
repeat(1.., (digit1, opt(",")).map(|x| x.0)),
(multispace0, ')', multispace0),
)
.parse_next(input)
}
$ ./winnow-426.rs
[/home/epage/.cargo/target/05/6398075af02aec/winnow-426.rs:16] offset1 = 18
[/home/epage/.cargo/target/05/6398075af02aec/winnow-426.rs:22] input1 = "(000,111,22~~~2)\n "
[/home/epage/.cargo/target/05/6398075af02aec/winnow-426.rs:28] offset2 = 18
[/home/epage/.cargo/target/05/6398075af02aec/winnow-426.rs:34] input2 = "(000,1~~~11,222)\n "
When encountering the second group, digit_group
fails. The repeat
inside of parse_digits
then backtracks, finds the repeat
is satisfied, and then finds parse_digits
is satisfied. parse_next
then succeeds, pointing at the start of the second group. The error then comes from the implicit eof
that is a part of parse
.
You can verify this by running with the debug feature
$ nargo run --manifest-path winnow-426.rs -F winnow/debug
> delimited | "(123,456,789)\n (000,111,22~~~2)\n
> multispace0 | "(123,456,789)\n (000,111,22~~~2)\n
> take_while | "(123,456,789)\n (000,111,22~~~2)\n
< take_while | +0
< multispace0 | +0
> repeat | "(123,456,789)\n (000,111,22~~~2)\n
> delimited | "(123,456,789)\n (000,111,22~~~2)\n
> multispace0 | "(123,456,789)\n (000,111,22~~~2)\n
> take_while | "(123,456,789)\n (000,111,22~~~2)\n
< take_while | +0
< multispace0 | +0
> one_of | "(123,456,789)\n (000,111,22~~~2)\n
> any | "(123,456,789)\n (000,111,22~~~2)\n
< any | +1
| verify |
< one_of | +1
> multispace0 | "123,456,789)\n (000,111,22~~~2)\n
> take_while | "123,456,789)\n (000,111,22~~~2)\n
< take_while | +0
< multispace0 | +0
> repeat | "123,456,789)\n (000,111,22~~~2)\n
> digit1 | "123,456,789)\n (000,111,22~~~2)\n
> take_while | "123,456,789)\n (000,111,22~~~2)\n
< take_while | +3
< digit1 | +3
> opt | ",456,789)\n (000,111,22~~~2)\n
> tag | ",456,789)\n (000,111,22~~~2)\n
< tag | +1
< opt | +1
> digit1 | "456,789)\n (000,111,22~~~2)\n "
> take_while | "456,789)\n (000,111,22~~~2)\n "
< take_while | +3
< digit1 | +3
> opt:1 | ",789)\n (000,111,22~~~2)\n "∅
> tag | ",789)\n (000,111,22~~~2)\n "∅
< tag | +1
< opt:1 | +1
> digit1 | "789)\n (000,111,22~~~2)\n "∅
> take_while | "789)\n (000,111,22~~~2)\n "∅
< take_while | +3
< digit1 | +3
> opt:2 | ")\n (000,111,22~~~2)\n "∅
> tag | ")\n (000,111,22~~~2)\n "∅
< tag | backtrack
< opt:2 | +0
> digit1 | ")\n (000,111,22~~~2)\n "∅
> take_while | ")\n (000,111,22~~~2)\n "∅
< take_while | backtrack
< digit1 | backtrack
< repeat | +11
> multispace0 | ")\n (000,111,22~~~2)\n "∅
> take_while | ")\n (000,111,22~~~2)\n "∅
< take_while | +0
< multispace0 | +0
> one_of | ")\n (000,111,22~~~2)\n "∅
> any | ")\n (000,111,22~~~2)\n "∅
< any | +1
| verify |
< one_of | +1
> multispace0 | "\n (000,111,22~~~2)\n "∅
> take_while | "\n (000,111,22~~~2)\n "∅
< take_while | +5
< multispace0 | +5
< delimited | +18
> delimited | "(000,111,22~~~2)\n "∅
> multispace0 | "(000,111,22~~~2)\n "∅
> take_while | "(000,111,22~~~2)\n "∅
< take_while | +0
< multispace0 | +0
> one_of | "(000,111,22~~~2)\n "∅
> any | "(000,111,22~~~2)\n "∅
< any | +1
| verify |
< one_of | +1
> multispace0 | "000,111,22~~~2)\n "∅
> take_while | "000,111,22~~~2)\n "∅
< take_while | +0
< multispace0 | +0
> repeat | "000,111,22~~~2)\n "∅
> digit1 | "000,111,22~~~2)\n "∅
> take_while | "000,111,22~~~2)\n "∅
< take_while | +3
< digit1 | +3
> opt | ",111,22~~~2)\n "∅
> tag | ",111,22~~~2)\n "∅
< tag | +1
< opt | +1
> digit1 | "111,22~~~2)\n "∅
> take_while | "111,22~~~2)\n "∅
< take_while | +3
< digit1 | +3
> opt:1 | ",22~~~2)\n "∅
> tag | ",22~~~2)\n "∅
< tag | +1
< opt:1 | +1
> digit1 | "22~~~2)\n "∅
> take_while | "22~~~2)\n "∅
< take_while | +2
< digit1 | +2
> opt:2 | "~~~2)\n "∅
> tag | "~~~2)\n "∅
< tag | backtrack
< opt:2 | +0
> digit1 | "~~~2)\n "∅
> take_while | "~~~2)\n "∅
< take_while | backtrack
< digit1 | backtrack
< repeat | +10
> multispace0 | "~~~2)\n "∅
> take_while | "~~~2)\n "∅
< take_while | +0
< multispace0 | +0
> one_of | "~~~2)\n "∅
> any | "~~~2)\n "∅
< any | +1
| verify | backtrack
< one_of | backtrack
< delimited | backtrack
< repeat | +18
> multispace0 | "(000,111,22~~~2)\n "∅
> take_while | "(000,111,22~~~2)\n "∅
< take_while | +0
< multispace0 | +0
< delimited | +18
> eof | "(000,111,22~~~2)\n "∅
< eof | backtrack
What you need to do is to add cut_err
calls for where it should disable backtracking. Likely this would be on the repeat
inside of digit_group
and the closing delimiter.
Thx @epage . "cut_err" is solvable, but during the process of analyzing the grammar, it's crucial to accurately identify which parts are disable backtracking. If "cut_err" is employed, manual marking of this portion of work is required, which does make the usage somewhat cumbersome.
If you have ideas for another strategy, I would be open to it.
Some examples of what rely on backtracking
repeat
alt
opt
A rough approximation of combine
is that it took the approach of making Cut
the default and requires opting in to Backtrack
. I said roughly which makes the above combinators work in some circumstances and then fail in others. Backtrack
by default is much easier to work with.
@epage There is currently no better idea. I will close it.
Please complete the following tasks
rust version
rustc 1.74.0 (79e9716c9 2023-11-13)
winnow version
0.5.34
Minimal reproducible code
Steps to reproduce the bug with the above code
cargo run
Actual Behaviour
Offset always returns 18
input1
andinput2
unexpectedly returned to the same position, which should be a bug.Expected Behaviour
Should return true offset
Additional Context
No response