Open faassen opened 1 year ago
I've implemented a custom error class, with a
Custom
error type on it. I want to return this custom error usingtry_map
and then see it at the end of the failed parse. It works, but not always. When it doesn't work,ExpectedFound
is produced instead (i.e. a non-custom error).
combined_year_custom_error
fails, the rest doesn't. Thedbg!
inyear_parser
behaves intriguingly: while the custom error is indeed returned fromtry_map
, it's replaced by anExpectedFound
by the time themap_err
immediately followingyear_parser
is invoked. If I useyear_parser
by myself theCustom
error is emitted by bothdbg!
.I tried to replicate the behavior in a simplified with using
combined_true_false_parser
but this does exhibit the expected behavior.use chumsky::prelude::*; use chumsky::util::MaybeRef; #[derive(Debug, Clone, PartialEq)] enum MyError { ExpectedFound { span: SimpleSpan<usize>, expected: Vec<Option<char>>, found: Option<char>, }, Custom, } impl<'a> chumsky::error::Error<'a, &'a str> for MyError { fn expected_found<E: IntoIterator<Item = Option<MaybeRef<'a, char>>>>( expected: E, found: Option<MaybeRef<'a, char>>, span: SimpleSpan<usize>, ) -> Self { Self::ExpectedFound { span, expected: expected .into_iter() .map(|e| e.as_deref().copied()) .collect(), found: found.as_deref().copied(), } } fn merge(self, other: Self) -> Self { println!("merging {:?} and {:?}", self, other); match (self, other) { (MyError::ExpectedFound { .. }, a) => a, (a, MyError::ExpectedFound { .. }) => a, (a, _) => a, } } } type MyExtra = extra::Err<MyError>; fn year_parser<'a>() -> impl Parser<'a, &'a str, i32, MyExtra> { let digit = any::<&str, MyExtra>().filter(|c: &char| c.is_ascii_digit()); let digits = digit.repeated().at_least(1).collect::<String>(); digits .try_map(|digits, _| { dbg!(&digits); dbg!(if digits.len() <= 4 { Ok(2000) } else { Err(MyError::Custom) }) }) .map_err(|e| dbg!(e)) } fn combined_year_parser<'a>() -> impl Parser<'a, &'a str, (i32, i32), MyExtra> { let year = year_parser().boxed(); year.clone().then_ignore(just("-")).then(year) } fn true_false_parser<'a>() -> impl Parser<'a, &'a str, bool, MyExtra> { just("true") .or(just("false")) .try_map(|token, _| { dbg!(if token == "true" { Ok(true) } else { Err(MyError::Custom) }) }) .map_err(|e| dbg!(e)) } fn combined_true_false_parser<'a>() -> impl Parser<'a, &'a str, (bool, bool), MyExtra> { let true_false = true_false_parser().boxed(); true_false.clone().then_ignore(just("-")).then(true_false) } #[test] fn year_custom_error() { let parser = year_parser(); let s = "123456789"; assert_eq!( parser.parse(s).errors().collect::<Vec<_>>(), vec![&MyError::Custom] ); } #[test] fn combined_year_custom_error() { let parser = combined_year_parser(); let s = "123456789-1000"; assert_eq!( parser.parse(s).errors().collect::<Vec<_>>(), vec![&MyError::Custom] ); } #[test] fn true_false_custom_error() { let parser = true_false_parser(); let s = "false"; assert_eq!( parser.parse(s).errors().collect::<Vec<_>>(), vec![&MyError::Custom] ); } #[test] fn combined_true_false_custom_error() { let parser = combined_true_false_parser(); let s = "false-true"; assert_eq!( parser.parse(s).errors().collect::<Vec<_>>(), vec![&MyError::Custom] ); } fn main() { let parser = combined_year_parser(); let s = "123456789-1000"; match parser.parse(s).into_result() { Ok(value) => println!("value: {:?}", value), Err(e) => println!("error: {:?}", e), } }
As a usability note: it was surprising to me that I needed to implement
merge
. Without it,year_custom_error
also fails.merge
doesn't get invoked forcombined_year_custom_error
, however.This is with the latest git version of chumsky.
It may be related to #483 in some way, as this talks about the use of
repeated
as well as triggering some kind of error erasure.
I believe in this case the solution is to use Parser::validate
and not Parser::try_map
for parsing the year. By swapping the two methods in year_parser:
fn year_parser<'a>() -> impl Parser<'a, &'a str, i32, MyExtra> {
let digit = any::<&str, MyExtra>().filter(|c: &char| c.is_ascii_digit());
let digits = digit.repeated().at_least(1).collect::<String>();
digits.validate(|digits, _, emitter| {
dbg!(&digits);
dbg!(if digits.len() <= 4 {
2000
} else {
emitter.emit(MyError::Custom);
// Some sentinel value because you know it was a date, just not a valid one.
// The custom error will be produced and will be found in the `ParseResult`
-1
})
})
}
You get the expected output (and all tests now pass):
error: [Custom]
Parser::try_map
is used when you want to validate an input, but have an Err
treated as if that item wasn't correctly parsed. Parser::validate
is used when you know you have successfully parsed a token, but have to validate that it meets some requirements
In the docs for Parser::validate
on newest branch:
Validate an output, producing non-terminal errors if it does not fulfil certain criteria. The errors will not immediately halt parsing on this path, but instead it will continue, potentially emitting one or more other errors, only failing after the pattern has otherwise successfully, or emitted another terminal error.
This function also permits mapping the output to a value of another type, similar to Parser::map.
If you wish parsing of this pattern to halt when an error is generated instead of continuing, consider using Parser::try_map instead.
Thank you! I will give that a shot.
I had encountered validate
in the documentation but I drew entirely the wrong conclusion from its documentation. In my mind I did want parsing to immediately halt, but I think the key phrase is "on this path". It's somewhat counterintuitive that in order for parsing to fail entirely, you actually need it to continue on a particular path!
Your explanation:
Parser::try_map is used when you want to have a possible input , but continue to try others. Parser::validate is used when you know you have successfully parsed a token, but have to validate that it meets some requirements
is helpful. I think some improvements in the docs might be helpful, but I need to digest this a bit more first.
@faassen I realize now that my explanation wasn't totally correct. Parser::try_map
treats an err
as a parsing error, and abandons that branch of parsing, where Parser::validate
is, as I previously wrote, when you parsed a thing, you know it's a thing, but that thing can be invalid (invalid dates, number literals that are too large and the like).
I personally haven't used try_map
but use validate
for parsing number literals in a toy programming language
I'm still confused why try_map
causes the branch to be abandoned with a custom error, there appears to be no other branch, but it doesn't return the custom error.
I must be wrong about there being no other branch - there is a valid branch with just 4 digits, but the rest of the pattern can't be parsed. It's just that it doesn't even seem to enter that branch, as I only see one output of digits
. Perhaps that's due to optimizations?
I am quite confused about the original issue. Maybe I've missed something (I'm currently viewing this on mobile while on a train) but to me the original code should be producing Error::Custom
. Unfortunately I'm not going to get a chance to look into this for a few days, but I want to investigate this.
To confirm, as @Zij-IT suggested, using validate
instead of try_map
does make these problems go away. I've already fixed several error handling bugs using this strategy. I don't understand why exactly, though.
Could there be a shortcut API for emit that looks more like try_map
?
try_validate(sentinel, |s, _| {
// this returns a result
s.parse()
})
and this gets turned into:
validate(|s, _, emitter| {
match s.parse() {
Ok(parsed) => parsed,
Err(err) => {
emitter.emit(err);
sentinel
}
}
})
This way try_validate
is almost like try_map
, except that an explicit sentinel is required, which is a bit icky but is what you end up doing when you use validate
manually as well.
I've implemented a custom error class, with a
Custom
error type on it. I want to return this custom error usingtry_map
and then see it at the end of the failed parse. It works, but not always. When it doesn't work,ExpectedFound
is produced instead (i.e. a non-custom error).combined_year_custom_error
fails, the rest doesn't. Thedbg!
inyear_parser
behaves intriguingly: while the custom error is indeed returned fromtry_map
, it's replaced by anExpectedFound
by the time themap_err
immediately followingyear_parser
is invoked. If I useyear_parser
by myself theCustom
error is emitted by bothdbg!
.I tried to replicate the behavior in a simplified with using
combined_true_false_parser
but this does exhibit the expected behavior.As a usability note: it was surprising to me that I needed to implement
merge
. Without it,year_custom_error
also fails.merge
doesn't get invoked forcombined_year_custom_error
, however.This is with the latest git version of chumsky.
It may be related to #483 in some way, as this talks about the use of
repeated
as well as triggering some kind of error erasure.