Open Geal opened 4 years ago
combinators will now require parser arguments to implement a Parser
trait (instead of Fn
) and return a FnMut
, as of 84d1ac025ba2.
How about making nom errors std compatible ?
Error in nom are still the most annoying thing IMO
Working with the Parser
's trait is a bit cumbersome, because each invocation of .and()
creates an additional tuple:
tag("in ")
.and(number())
.and(tag(" days"))
.map(|((_, days), _)| RelativeDate::Days(days))
// ^^^^^^^^^^^^^^ nasty nesting
warp does some trait magic that automatically "flattens" enums [1], allowing - in our case - for:
tag("in ")
.and(number())
.and(tag(" days"))
.map(|(_, days, _)| RelativeDate::Days(days))
Maybe something similar could be implemented for nom 6.0? It'd be a breaking change though, so perfect time to decide now :-)
[1] https://docs.rs/warp/0.1.16/warp/trait.Filter.html#extracting-tuples - in their case it's about parameters passed to route handlers
@Waelwindows it's there with bf83d5d9789893, but are you expecting specific behaviours from std::error::Error
implementation? Like, implementing properly source()
and backtrace()
would require a new VerboseError
implementation.
@Stargateur do you have any specific issues in mind? Would things like the finish()
method ( 9a555fe646488 ) help?
Well for example, make custom error is very hard, to do and to understand, I'm not sure I can do it even now. Then, if you want to use VerboseError, and want to return the error, often you can't keep the borrow, so I have this:
let (_, sip_message) = sip_message(sip::Input::new(input))
.map_err(|e| {
e.map(|e| nom::error::VerboseError {
errors: e
.errors
.into_iter()
.map(|(i, e)| (String::from_utf8_lossy(i.fragment()).into_owned(), e))
.collect(),
})
})
It's work, but it's far from being user friendly. finish()
look great.
The fact that ErrorKind doesn't have a Custom field or something, even, without user data. Make your parser hard to debug cause you use ErrorKind no custom variant but for your parser so it's hard to know if it's your custom error or the nom parser one.
Currently, I don't know if I can easy switch from VerboseError to ErrorKind to avoid unnecessary allocation in production but my parser is currently not the bottleneck of my project so it's ok for me for now.
(The enum Err not in module error make me confuse a lot of time)
@Stargateur in previous versions, ErrorKind
had a Custom
variant, but it actually made errors much harer to manipulate. Could you tell me more about the difficulties in making a custom error type? Maybe that's something I could solve with better documentation and some examples
@Geal On the subject of custom errors, I tried to implement a custom error type and while I got it to work using https://github.com/Geal/nom/pull/1151 , I got stuck on using ?
for error handling which wants a From
impl I can't seem to provide (see match
below for verbose example usage). I can continue with the more verbose form for now but would definitely appreciate either documentation on the below or an indication that I did something silly 😅
| let (input, val) = alpha(input)?;
| ^ the trait `std::convert::From<nom::Err<(&[u8], nom::error::ErrorKind)>>` is not implemented for `nom::Err<MyCustomError<&[u8]>>`
named!(hashtags, is_a!("#"));
named!(alpha, take_while!(is_alphanumeric));
fn atx_heading(input: &[u8]) -> IResult<&[u8], ATXHeading, MyCustomError<&[u8]>> {
// long-form match is fine
let result_tags = hashtags(input);
let (input, hashes) = match result_tags {
Ok(v) => v,
Err(Error((i, e))) => return Err(Error(MyCustomError::Nom(i, e))),
};
// custom logic returning an error, also fine
if hashes.len() > 6 {
return Err(Error(MyCustomError::MyError));
}
// ? wants a From impl I can't provide
let (input, val) = alpha(input)?;
Ok((
input,
ATXHeading {
level: hashes.len(),
value: val,
},
))
}
(edit: I couldn't get this to work tonight. Maybe another time)
@ChristopherBiscardi this is linked to #1010: when you use ?
, the From
implementation is applied on the error type ( https://doc.rust-lang.org/std/ops/trait.Try.html?search=#tymethod.into_result )
Unfortunately, I cannot fix it without trait specialization. in the meantime, if you already implemented From<(&[u8], ErrorKind)> for MyCustomError<&[u8]>
, you could write this: alpha(input).map_err(Err::convert)?
I think I could add a helper to IResult
, to write alpha(input)?.convert()
.
Another way to fix your issue would be to make a alpha
parser directly with the error type you need:
fn alpha(i:&[u8]) -> IResult<&[u8], &[u8], MyCustomError<&[u8]>> {
take_while(is_alphanumeric)(i)
}
or even shorter, inside your atx_heading
function:
let alpha = take_while(is_alphanumeric);
I was able to get my example working with your suggestions, thanks!
With nom the tendency is to write recursive parsers. This fits with the mental model, and is encouraged by the examples.
The issue that arises is that when parsing input that contains deeply nested data structures the parser will fail with a stack overflow. This is a problem for security, as an attacker can intentionally craft input which if parsed will crash a process on demand - as noted in this issue.
The request in the issue is to have a recursion limit which causes the parser to fail without aborting the process. This comes with issues of it's own, and it would be better if it were possible to parse the input successfully regardless of the stack size.
I recently took a stab at converting a small part of a parser that I'm writing which was recursive to be stack based instead. It's not super pleasant, requiring more code and thought to make it correct. I managed, but I think to convert something more involved like json which has several different places where recursion can occur would be a mind bending exercise.
For Nom 6 my wish is that the most natural way to use the APIs ends up using some stack based backend instead of a recursion based one without making everything ugly and hard to follow.
I have two suggestions for nom 6:
Needed::Unknown
variant and just use Size(1)
for that case;For the first: The only interpretation of an Incomplete
result that makes sense to me is if the number of bytes returned means that reading any fewer bytes can't possibly generate an Ok
result. (An Error
or Failure
result might be possible with less input, though.) If the length needed is not known, then the only reasonable behavior is to retry the parser once you have at least one more byte, right? I see that the Unknown
variant dates back to 232f6766c6c00ecb8cc1e4425240f6e07b5cbaef in 2015 but it isn't clear to me why it was introduced.
The second proposal has more background.
I think completeness is better treated as a property of the input. Aside from special cases like length_value
, the remaining input after a prefix has been parsed is complete if and only if the original input was. In current nom, if you want to switch between streaming and complete, you have to rewrite the entire tree of parsers. I think you should be able to attempt incremental parses and then try one more run of the same parser in "complete" mode if you read EOF.
I was trying to write a parser that would stop as soon as it found the information I need in a TLS ClientHello
message. That requires parsing several layers of nested length-prefixed structures, so length_value
seemed like the right combinator. But then I discovered that length_value
doesn't invoke its second parser until it has at least as much data available as the first parser indicated it needs, so I can't use it for streaming parsing.
It was easy to change the implementation to do what I wanted, but then some of the tests failed because there's no way for length_value
to indicate to its nested parser whether it's handing off a complete buffer or not. I considered adding a separate variant that only works with streaming sub-parsers, but that made my head hurt.
I spent a while thinking about how it could work instead, like allowing parsers to return an "incomplete but if there's no more data coming then here's the result I would produce" error, which complete
would transform into an Ok
. But that seemed awfully complicated too.
The best I can come up with is making input types be something like struct Input<T>(T, bool)
or enum Input<T> { Complete(T), Incomplete(T) }
or something along those lines. As a nice bonus, I think that means you can completely remove the streaming/complete split from the API, and have just one version of every parser. Those parsers that need to know whether the input is complete could just check, and the rest would blindly forward that flag to the next parser.
@jameysharp
Remove the Needed::Unknown variant and just use Size(1) for that case;
For the first: The only interpretation of an Incomplete result that makes sense to me is if the number of bytes returned means that reading any fewer bytes can't possibly generate an Ok result. (An Error or Failure result might be possible with less input, though.) If the length needed is not known, then the only reasonable behavior is to retry the parser once you have at least one more byte, right? I see that the Unknown variant dates back to 232f676 in 2015 but it isn't clear to me why it was introduced.
It really depends on the format and the IO pattern that is used. Using Size(1)
signals that we know exactly how much is needed, but if we actually don't, reading just 1 more byte, parsing, returning Size(1)
then reading 1 more byte, reparsing, might be too costly.
I think completeness is better treated as a property of the input. Aside from special cases like length_value, the remaining input after a prefix has been parsed is complete if and only if the original input was. In current nom, if you want to switch between streaming and complete, you have to rewrite the entire tree of parsers. I think you should be able to attempt incremental parses and then try one more run of the same parser in "complete" mode if you read EOF.
streaming VS complete can actually change inside a parser. As an example, a MP4 file can be an arbitrarily long list of atoms of different sizes, but each element in it is length prefixed.
For your use case, a custom input type would be a good way to experiment with it: https://github.com/Geal/nom/blob/master/doc/custom_input_types.md By implementing a serie of traits ( https://github.com/Geal/nom/blob/master/src/traits.rs ) you can have the input type you need, that would be used in a generic way by most parsers, but for which the additional bool could be recognized by parsers who need it
With regard to MSRV, what features from 1.44 (and 1.37) are used? Just curious on the feasibility of it being lower.
@Geal
@Waelwindows it's there with bf83d5d
VerboseError
does implement Debug
and Display
, but Error
has also to be implemented (there's no blanket implementation that I'm aware of). Otherwise many error handling utilities will just not work with it.
In this case it would be an empty implementation.
Edit: ah, there is an impl for nom::error::Error
. Looks like I missed that one: https://github.com/Geal/nom/blob/c5090ef1e2a1704851b1773169653728b0761d9c/src/error.rs#L99
Edit2: still, it doesn't show up in the documentation, and I still fail to use this trait impl for some reason.
Edit3: aaand there is indeed no impl for VerboseError
.
Turns out what I wrote above is for naught, since the actual error type you usually encounter is nom::Err
, not nom::error::Error
or nom::error::VerboseError
. Sorry for the noise.
However, in my defense I'll have to point out that this is somewhat confusing. In most cases, a type called Error
will usually be used in a Result
directly, rather than wrapped in another error type.
Hi @Geal can you release a new version with this fix in particular? https://github.com/Geal/nom/commit/59fb86a87ddc65d67dc649b8aff2ad5eacdd1f6c
There's 7.0.0-alpha1 available with the fix
On Fri, Jul 9, 2021, 08:01 Hanif Ariffin @.***> wrote:
Hi @Geal https://github.com/Geal can you release a new version with this fix in particular? 59fb86a https://github.com/Geal/nom/commit/59fb86a87ddc65d67dc649b8aff2ad5eacdd1f6c
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Geal/nom/issues/1132#issuecomment-876936696, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAA5EAEY7TLJCYSCHHLFALTTW2GFLANCNFSM4MCHIEDA .
placeholder for now, there is no plan
some ideas for now:
Fn(Input) -> IResult<Input,Output,Error>
, some users needed aFnMut
instead, it could also be a more abstract trait