rust-bakery / nom

Rust parser combinator framework
MIT License
9.38k stars 805 forks source link

Feature request: Errors should not have the same type as Failure #1669

Closed peckpeck closed 1 year ago

peckpeck commented 1 year ago

Hello and thank you for writing nom!

First, I understand this is a breaking change.

When writing a parser, we can write for ourselves (parse a file and use it) or write it for someone else (provide a parser in a library). In this second case, we need a main parser that translates all remaining errors and failure in a type that is publicly available to the library. If we want meaningful error, we have two options :

The first case needs a lot of matches to make sure we cover all cases, and we cannot guarantee we cover every case.

The second case suffers from error type propagation, every parser must then use it and so must have translation code, even if we don't need it.

I suggest that there could be a third option: nom::Err should be defined this way

pub enum Err<E,F=E> {
    Incomplete(Needed),
    Error(E),
    Failure(F),
}

Since the failure stops the parsing, if is designated to go directly to the main parser, whereas simple error just back-propagate and will most of the time be discarded. So we keep Error(E) that can use nom::Error and we don't need to handle all those cases with used defined error, and we have Failure(F) where it is possible to defined our own error type without handling non failure cases.

To work around this, my own errorkinds contain a Nom alternative pointing to a VerboseError that is only valid if we are in Err::Error but not in Err::Failure. The drawback is that all of this is checked at runtime and not at compile time.

If you think this is a good idea, I could work on a patch.

epage commented 1 year ago

FYI #1631 changes Err to match what you mention though I don't think that PR fully supports the use case yet.

I've dealt with some similar changes

I feel like there might be other side effects within the design of nom but I can't remember for sure what the details are.

peckpeck commented 1 year ago

If I understand correctly, the PR focuses on stripping parsers output when not needed, whereas I would like to strip the error side of the result.

While thinking about it, I "may" even be happy with using Err<(),MyErrorType> in the above definition, which should not make the error type too big.

epage commented 1 year ago

Part of stripping parser outputs when not needed is skipping expensive error generation that is not needed, like all but the last branch of an alt (granted, unsure if that PR will handle that case as an error type can integrate information from all of the branches).

While thinking about it, I "may" even be happy with using Err<(),MyErrorType> in the above definition, which should not make the error type too big.

That depends on the size of MyErrorType as the size of Err is usize + max(0 + size_of::<MyErrorType>())

peckpeck commented 1 year ago

A quick tests show that:

For cut, I suggest we keep cut<I, E: ParseError<I>, F>(mut parser: F) and add a cut_with<I, E1: ParseError<I>, E2: ParseError<I>, F>(mut parser: F, converter: Fn(E1)->E2)

Geal commented 1 year ago

so, a bit of context here. Failure is used to prevent parsers from backtracking and testing other branches (typically in alt), but that does not mean it has to go up to the root parser. It is used mainly for flow control, but it should still carry the same error returned by the underlying parser. Now the unfortunate thing here is that I have tried for a long time to use the parser result to carry and accumulate errors, but in the end it does not fit well for people who want more complex error types with a lot of context. In particular, it does not fit the case where people want to recover from errors but still accumulate them. So for that use case I'll add external state management where complex errors can be stored (@epage this should be similar to the State structure in your and other libraries), and keep Error/Failure for light error management inside the parsers, with #1631 reducing the overhead of errors that are created and discarded right away. You can see in https://github.com/rust-bakery/nom/pull/1631/files#diff-38fd38c32263eb6b2533191b9ae1a66dd7f9a9e2593a2ce9c5fcb91f10aa65c2R98 that I am actually making the same change to the err structure, but it is for a different reason: mainly, preventing error management code from being generated independently for error and failure

peckpeck commented 1 year ago

After trying a few things and reading your comment, understand that separating Err from Failure doesn't bring me what I would like.