Open yejunjin opened 1 year ago
While its not in the docs, in the code is this comment
infinite loop check: the parser must always consume
Meaning, the separator parser. I assume this is protecting against writing code where both the separator and the core type are allowed to be empty, you'd get an infinite loop, so nom seems to say the core type may be empty but the separator must not be.
Maybe related with #1573
The sep
parser in seperated_list
must not be optional because if it is not present then consecutive items become just one item.
Perhaps the required sep
parser is a alternate separators,
alt((sep1, sep2, ...))
I ran into the same thing. Is there a more accepted pattern currently for a sequence of items that are optionally separated by whitespace? I've switched something like a many1(preceded(multispace0, my_parser))
, but that doesn't work exactly the same because it allows leading separators (which isn't actually a problem in my context, at least), and separated_list1
would read more clearly.
If the separator is optional then what is to separate two consecutive elements when the separator is absent?
without testing, this seems to be what you might want...
many1(tuple((element, many0(sep))))
Rust saves us from ourselves ;)
The return type on that isn't great, though. You'd have to map
the tuple
combinator to get the elements on their own. preceded
or terminated
are probably better options.
that check for non consuming input in repeating parser dates from the very beginning of nom. People kept running into issues with parsers going into infinite loop, in particular due to optional whitespace. Even now in some user tests without the check people trip on it :sweat_smile:
Sure, that makes sense, but I think it would be more robustly handled by returning an error in the case that both sep
and f
both parse empty in a row, rather than one or the other. As long as one of the two isn't parsing empty, there's no risk of an infinite loop.
An optional sep
in a separated list is conceptually invalid because if the separator does not exist the the elements are not separated and you then do not have the presence of two elements but one. Perhaps what is regarded as an optional is really an alternate combinator such as comma or space.
Another situation is when the elements themselves are bracketed and the separator is purely cosmetic such as a sequence of quoted bracketed strings. However, this later case isn't a separated list but a sequence of elements with optional suffix (or prefix).
I wonder if we cand define traits and auto implement them against parser results so as to use bounds to better document this functionality.
...pseudo
trait Consuming {}
trait NonConsuming {}
impl<..> Consuming for fn(I)->IResult<..>
impl<..> NonConsuming for fn(I)->IResult<I,Option<O>,E>
impl<..> NonConsuming for fn(I)->IResult<I,Result<O,E2>,E>
An optional sep in a separated list is conceptually invalid
You present this as fact when it's really just your opinion. The difference between separated_list
and a prefix/suffix per element is that the former only looks for the separator between pairs of elements, while the latter also looks for a separator at the very beginning (prefix) or end (suffix). They are not equivalent, regardless of whether the separator is optional.
if the separator does not exist the the elements are not separated and you then do not have the presence of two elements but one
That depends on what's being parsed. In my case, I have a list of {...}
blocks, which I want to be separated by any amount of whitespace (including zero whitespace). I don't want to allow leading or trailing whitespace, and I always want at least one block. separated_list1(multispace0, block)
would have been perfect for my case, but it gave an unhelpful error instead, which is how I found this issue.
Perhaps what is regarded as an optional is really an alternate combinator such as comma or space.
That's not a problem, because if it's not optional, it's not optional. If somebody uses alt((tag(','), tag(' ')))
or something, it can't be skipped anyway.
separated_list1(foo, bar)
is a parser that has a mandatory list of bar
items separated by mandatory foo
items.
separated_list1(opt(foo), bar)
is a parser with bar
items possibly separated by foo
items.
In no situation should it be unclear whether the separator is optional or not, because you can always use a mandatory combinator if you want it to be mandatory. This is one of the major strengths of combinators, that you can build exactly the behavior you want from the building blocks. The suggestion is not making the separator optional, but allowing it to be a potentially non-consuming parser.
Actually, the reason I am on this thread is because I had a similar presumption about this functionality for the optionality of sep
. Specifically, that sep
should be capable of being optional due to the fact that it, sep
, and the element
parser are self delimiting maximally consuming independent parsers. Perhaps a new version of separated_list
, separatedb_list
, can be added to the library that implements this wildly expected behaviour.
Incidentally, my apologizes if I sound emphatic or definitive. I am an infinite novice in this and many things. :)
// test cases:
(sep, element) => [element]
(sep, opt_element) => [opt_element]
(opt_sep, element) => [element]
(opt_sep, opt_element) => [opt_element]
(element, element) => [element;1]
(opt_element, opt_element) => Err
I have moved the loop check in separated_list0
and separated_list1
to cover the application of both parsers: https://github.com/rust-bakery/nom/pull/1756
As @Taywee said
Sure, that makes sense, but I think it would be more robustly handled by returning an error in the case that both sep and f both parse empty in a row, rather than one or the other. As long as one of the two isn't parsing empty, there's no risk of an infinite loop.
This will offer enough protection against infinite loops
We should probably remove the comment that sep
must be consuming...
/// # Arguments
/// * `sep` Parses the separator between list elements. Must be consuming.
/// * `f` Parses the elements of the list.
And maybe the test cases should also be updated. Expecting a new release version ;)
Hello, and thank you for submitting an issue to nom!
First, please note that, for family reasons, I have limited time to work on nom, so following the advice here will make sure I will quickly understand your problem and answer as soon as possible. Second, if I don't get to work on your issue quickly, that does not mean I don't consider it important or useful. Major version releases happen once a year, and a lot of fixes are done for the occasion, once I have had time to think of the right solution. So I will get back to you :)
Prerequisites
Here are a few things you should provide to help me understand the issue:
Test case
Please provide a short, complete (with crate import, etc) test case for the issue, showing clearly the expected and obtained results.
Example test case: