Closed nicopap closed 9 months ago
This also recently came up in nom's issue tracker (rust-bakery/nom#1691) which was also discussed in the chat (https://matrix.to/#/!siAWDgffrOjdrhkczN:gitter.im/$dwtC3OMAmaJi7R5gspxJrPdujXnnb60J9gDk3v1bF6k?via=gitter.im&via=matrix.org&via=xiretza.xyz)
For easier reference for others, the relevant code is here: https://github.com/winnow-rs/winnow/blob/main/src/combinator/multi.rs#L303
If I'm understanding your proposal, it is taking it from "sep must not be empty" to "(sep or parser) must not be empty".
On the surface that seems reasonable. There is a less direct, human aspect that I'm concerned about which is by having the assert fire less often, it'll make it less obvious and be a bug lurking in programs.
This was also discussed in #365.
I'm going to close this. Higher-level built-in parsers are provided for convenience and don't have to be all things to all people. Rather than having logic that might be more difficult to predict, I think we should be consistent.
If there is a reason we should re-evaluate this, let us know!
Please complete the following tasks
winnow version
0.5.15
Describe your use case
I have:
parser
that is guarenteed to consume 1 to N token.sep
that may consume 0 to N token.I want to use
separated{0,1}
such thatsep
is the separator andparser
the element.Currently, such a setup may panic in winnow if
sep
matches 0 tokens.Click for minimal reproducible example
```rust use winnow::ascii::{alphanumeric1, multispace0}; use winnow::combinator::{alt, delimited, separated0}; use winnow::{PResult, Parser}; fn token_tree(input: &mut &str) -> PResult<()> { let token_trees = separated0::<_, _, (), _, _, _, _>(token_tree, multispace0); let mut parser = alt(( alphanumeric1.void(), delimited('(', delimited(multispace0, token_trees, multispace0), ')').void(), )); parser.parse_next(input) } fn main() { let separated = separated0::<_, _, (), _, _, _, _>; let text = std::hint::black_box(" hello (world(hello) world) ()(hello) world "); let parser = separated(token_tree, multispace0).recognize(); let parsed = delimited(multispace0, parser, multispace0).parse(text); assert_eq!(parsed, Ok("hello (world(hello) world) ()(hello) world")); } ```Describe the solution you'd like
The implementation is trivial.
In the implementation of
separated1
(andseparated0
) functions, move theif i.eof_offset() == len
block to within theparser.parse_next(i)
match, just before theres.accumulate(o)
, so that the infinite loop check accounts for the option of theparser
being consuming, rather than just thesep
.This successfully prevent infinite loops, as we still guarantee the parser advances and never reaches a fixed point.
A side effect is that it makes
rustfmt
happy and reduces 1 level of indentation.Alternatives, if applicable
For my specific use-case, the
sep
in question are whitespaces, and all I need to do is trim the parser output. I also am considering implementing a streaming tokenizer through theStream
trait to not have to care about whitespaces (and comments) in my parser.Additional Context
When seeing
separated
, people often think of comma-separated list of elements as you'd see them in ALGOL-like languages. But I think whitespace delimited languages (say LISP) it makes sense to have a whitespace separator.This use-case came in as I was designing a simple version of the rust TokenTree syntax element.