zesterer / chumsky

Write expressive, high-performance parsers with ease.
https://crates.io/crates/chumsky
MIT License
3.63k stars 155 forks source link

Custom error isn't respected in some cases #530

Open faassen opened 1 year ago

faassen commented 1 year ago

I've implemented a custom error class, with a Custom error type on it. I want to return this custom error using try_map and then see it at the end of the failed parse. It works, but not always. When it doesn't work, ExpectedFound is produced instead (i.e. a non-custom error).

combined_year_custom_error fails, the rest doesn't. The dbg! in year_parser behaves intriguingly: while the custom error is indeed returned from try_map, it's replaced by an ExpectedFound by the time the map_err immediately following year_parser is invoked. If I use year_parser by myself the Custom error is emitted by both dbg!.

I tried to replicate the behavior in a simplified with using combined_true_false_parser but this does exhibit the expected behavior.

use chumsky::prelude::*;
use chumsky::util::MaybeRef;

#[derive(Debug, Clone, PartialEq)]
enum MyError {
    ExpectedFound {
        span: SimpleSpan<usize>,
        expected: Vec<Option<char>>,
        found: Option<char>,
    },
    Custom,
}

impl<'a> chumsky::error::Error<'a, &'a str> for MyError {
    fn expected_found<E: IntoIterator<Item = Option<MaybeRef<'a, char>>>>(
        expected: E,
        found: Option<MaybeRef<'a, char>>,
        span: SimpleSpan<usize>,
    ) -> Self {
        Self::ExpectedFound {
            span,
            expected: expected
                .into_iter()
                .map(|e| e.as_deref().copied())
                .collect(),
            found: found.as_deref().copied(),
        }
    }

    fn merge(self, other: Self) -> Self {
        println!("merging {:?} and {:?}", self, other);
        match (self, other) {
            (MyError::ExpectedFound { .. }, a) => a,
            (a, MyError::ExpectedFound { .. }) => a,
            (a, _) => a,
        }
    }
}

type MyExtra = extra::Err<MyError>;

fn year_parser<'a>() -> impl Parser<'a, &'a str, i32, MyExtra> {
    let digit = any::<&str, MyExtra>().filter(|c: &char| c.is_ascii_digit());

    let digits = digit.repeated().at_least(1).collect::<String>();

    digits
        .try_map(|digits, _| {
            dbg!(&digits);
            dbg!(if digits.len() <= 4 {
                Ok(2000)
            } else {
                Err(MyError::Custom)
            })
        })
        .map_err(|e| dbg!(e))
}

fn combined_year_parser<'a>() -> impl Parser<'a, &'a str, (i32, i32), MyExtra> {
    let year = year_parser().boxed();
    year.clone().then_ignore(just("-")).then(year)
}

fn true_false_parser<'a>() -> impl Parser<'a, &'a str, bool, MyExtra> {
    just("true")
        .or(just("false"))
        .try_map(|token, _| {
            dbg!(if token == "true" {
                Ok(true)
            } else {
                Err(MyError::Custom)
            })
        })
        .map_err(|e| dbg!(e))
}

fn combined_true_false_parser<'a>() -> impl Parser<'a, &'a str, (bool, bool), MyExtra> {
    let true_false = true_false_parser().boxed();
    true_false.clone().then_ignore(just("-")).then(true_false)
}

#[test]
fn year_custom_error() {
    let parser = year_parser();
    let s = "123456789";
    assert_eq!(
        parser.parse(s).errors().collect::<Vec<_>>(),
        vec![&MyError::Custom]
    );
}

#[test]
fn combined_year_custom_error() {
    let parser = combined_year_parser();
    let s = "123456789-1000";
    assert_eq!(
        parser.parse(s).errors().collect::<Vec<_>>(),
        vec![&MyError::Custom]
    );
}

#[test]
fn true_false_custom_error() {
    let parser = true_false_parser();
    let s = "false";
    assert_eq!(
        parser.parse(s).errors().collect::<Vec<_>>(),
        vec![&MyError::Custom]
    );
}

#[test]
fn combined_true_false_custom_error() {
    let parser = combined_true_false_parser();
    let s = "false-true";
    assert_eq!(
        parser.parse(s).errors().collect::<Vec<_>>(),
        vec![&MyError::Custom]
    );
}

fn main() {
    let parser = combined_year_parser();
    let s = "123456789-1000";
    match parser.parse(s).into_result() {
        Ok(value) => println!("value: {:?}", value),
        Err(e) => println!("error: {:?}", e),
    }
}

As a usability note: it was surprising to me that I needed to implement merge. Without it, year_custom_error also fails. merge doesn't get invoked for combined_year_custom_error, however.

This is with the latest git version of chumsky.

It may be related to #483 in some way, as this talks about the use of repeated as well as triggering some kind of error erasure.

Zij-IT commented 1 year ago

I've implemented a custom error class, with a Custom error type on it. I want to return this custom error using try_map and then see it at the end of the failed parse. It works, but not always. When it doesn't work, ExpectedFound is produced instead (i.e. a non-custom error).

combined_year_custom_error fails, the rest doesn't. The dbg! in year_parser behaves intriguingly: while the custom error is indeed returned from try_map, it's replaced by an ExpectedFound by the time the map_err immediately following year_parser is invoked. If I use year_parser by myself the Custom error is emitted by both dbg!.

I tried to replicate the behavior in a simplified with using combined_true_false_parser but this does exhibit the expected behavior.

use chumsky::prelude::*;
use chumsky::util::MaybeRef;

#[derive(Debug, Clone, PartialEq)]
enum MyError {
    ExpectedFound {
        span: SimpleSpan<usize>,
        expected: Vec<Option<char>>,
        found: Option<char>,
    },
    Custom,
}

impl<'a> chumsky::error::Error<'a, &'a str> for MyError {
    fn expected_found<E: IntoIterator<Item = Option<MaybeRef<'a, char>>>>(
        expected: E,
        found: Option<MaybeRef<'a, char>>,
        span: SimpleSpan<usize>,
    ) -> Self {
        Self::ExpectedFound {
            span,
            expected: expected
                .into_iter()
                .map(|e| e.as_deref().copied())
                .collect(),
            found: found.as_deref().copied(),
        }
    }

    fn merge(self, other: Self) -> Self {
        println!("merging {:?} and {:?}", self, other);
        match (self, other) {
            (MyError::ExpectedFound { .. }, a) => a,
            (a, MyError::ExpectedFound { .. }) => a,
            (a, _) => a,
        }
    }
}

type MyExtra = extra::Err<MyError>;

fn year_parser<'a>() -> impl Parser<'a, &'a str, i32, MyExtra> {
    let digit = any::<&str, MyExtra>().filter(|c: &char| c.is_ascii_digit());

    let digits = digit.repeated().at_least(1).collect::<String>();

    digits
        .try_map(|digits, _| {
            dbg!(&digits);
            dbg!(if digits.len() <= 4 {
                Ok(2000)
            } else {
                Err(MyError::Custom)
            })
        })
        .map_err(|e| dbg!(e))
}

fn combined_year_parser<'a>() -> impl Parser<'a, &'a str, (i32, i32), MyExtra> {
    let year = year_parser().boxed();
    year.clone().then_ignore(just("-")).then(year)
}

fn true_false_parser<'a>() -> impl Parser<'a, &'a str, bool, MyExtra> {
    just("true")
        .or(just("false"))
        .try_map(|token, _| {
            dbg!(if token == "true" {
                Ok(true)
            } else {
                Err(MyError::Custom)
            })
        })
        .map_err(|e| dbg!(e))
}

fn combined_true_false_parser<'a>() -> impl Parser<'a, &'a str, (bool, bool), MyExtra> {
    let true_false = true_false_parser().boxed();
    true_false.clone().then_ignore(just("-")).then(true_false)
}

#[test]
fn year_custom_error() {
    let parser = year_parser();
    let s = "123456789";
    assert_eq!(
        parser.parse(s).errors().collect::<Vec<_>>(),
        vec![&MyError::Custom]
    );
}

#[test]
fn combined_year_custom_error() {
    let parser = combined_year_parser();
    let s = "123456789-1000";
    assert_eq!(
        parser.parse(s).errors().collect::<Vec<_>>(),
        vec![&MyError::Custom]
    );
}

#[test]
fn true_false_custom_error() {
    let parser = true_false_parser();
    let s = "false";
    assert_eq!(
        parser.parse(s).errors().collect::<Vec<_>>(),
        vec![&MyError::Custom]
    );
}

#[test]
fn combined_true_false_custom_error() {
    let parser = combined_true_false_parser();
    let s = "false-true";
    assert_eq!(
        parser.parse(s).errors().collect::<Vec<_>>(),
        vec![&MyError::Custom]
    );
}

fn main() {
    let parser = combined_year_parser();
    let s = "123456789-1000";
    match parser.parse(s).into_result() {
        Ok(value) => println!("value: {:?}", value),
        Err(e) => println!("error: {:?}", e),
    }
}

As a usability note: it was surprising to me that I needed to implement merge. Without it, year_custom_error also fails. merge doesn't get invoked for combined_year_custom_error, however.

This is with the latest git version of chumsky.

It may be related to #483 in some way, as this talks about the use of repeated as well as triggering some kind of error erasure.

I believe in this case the solution is to use Parser::validate and not Parser::try_map for parsing the year. By swapping the two methods in year_parser:

fn year_parser<'a>() -> impl Parser<'a, &'a str, i32, MyExtra> {
    let digit = any::<&str, MyExtra>().filter(|c: &char| c.is_ascii_digit());

    let digits = digit.repeated().at_least(1).collect::<String>();

    digits.validate(|digits, _, emitter| {
        dbg!(&digits);
        dbg!(if digits.len() <= 4 {
            2000
        } else {
            emitter.emit(MyError::Custom);
            // Some sentinel value because you know it was a date, just not a valid one.
           // The custom error will be produced and will be found in the `ParseResult`
            -1
        })
    })
}

You get the expected output (and all tests now pass):

error: [Custom]

Parser::try_map is used when you want to validate an input, but have an Err treated as if that item wasn't correctly parsed. Parser::validate is used when you know you have successfully parsed a token, but have to validate that it meets some requirements

In the docs for Parser::validate on newest branch:

Validate an output, producing non-terminal errors if it does not fulfil certain criteria. The errors will not immediately halt parsing on this path, but instead it will continue, potentially emitting one or more other errors, only failing after the pattern has otherwise successfully, or emitted another terminal error.

This function also permits mapping the output to a value of another type, similar to Parser::map.

If you wish parsing of this pattern to halt when an error is generated instead of continuing, consider using Parser::try_map instead.

faassen commented 1 year ago

Thank you! I will give that a shot.

I had encountered validate in the documentation but I drew entirely the wrong conclusion from its documentation. In my mind I did want parsing to immediately halt, but I think the key phrase is "on this path". It's somewhat counterintuitive that in order for parsing to fail entirely, you actually need it to continue on a particular path!

Your explanation:

Parser::try_map is used when you want to have a possible input , but continue to try others. Parser::validate is used when you know you have successfully parsed a token, but have to validate that it meets some requirements

is helpful. I think some improvements in the docs might be helpful, but I need to digest this a bit more first.

Zij-IT commented 1 year ago

@faassen I realize now that my explanation wasn't totally correct. Parser::try_map treats an err as a parsing error, and abandons that branch of parsing, where Parser::validate is, as I previously wrote, when you parsed a thing, you know it's a thing, but that thing can be invalid (invalid dates, number literals that are too large and the like).

I personally haven't used try_map but use validate for parsing number literals in a toy programming language

faassen commented 1 year ago

I'm still confused why try_map causes the branch to be abandoned with a custom error, there appears to be no other branch, but it doesn't return the custom error.

I must be wrong about there being no other branch - there is a valid branch with just 4 digits, but the rest of the pattern can't be parsed. It's just that it doesn't even seem to enter that branch, as I only see one output of digits. Perhaps that's due to optimizations?

zesterer commented 1 year ago

I am quite confused about the original issue. Maybe I've missed something (I'm currently viewing this on mobile while on a train) but to me the original code should be producing Error::Custom. Unfortunately I'm not going to get a chance to look into this for a few days, but I want to investigate this.

faassen commented 1 year ago

To confirm, as @Zij-IT suggested, using validate instead of try_map does make these problems go away. I've already fixed several error handling bugs using this strategy. I don't understand why exactly, though.

Could there be a shortcut API for emit that looks more like try_map?

try_validate(sentinel, |s, _| {
     // this returns a result
     s.parse()
})

and this gets turned into:

validate(|s, _, emitter| {
   match s.parse() {
       Ok(parsed) => parsed,
       Err(err) => { 
            emitter.emit(err);
           sentinel
       } 
   }
})

This way try_validate is almost like try_map, except that an explicit sentinel is required, which is a bit icky but is what you end up doing when you use validate manually as well.