Open JanLek opened 4 years ago
Similar question.
same problem, it seems that parser succeeds/fails depending on order of input data of the permutation()
children nodes...
e.g. inpt text:
organization "ACME comp";
contact "joe joe";
description "blah blah";
reference "that RFC over there";
is parsed correctly, but the "same" input with reordered items:
organization "ACME comp";
contact "joe joe";
reference "that RFC over there";
description "blah blah";
fails with error for parser:
permutation((
opt(organization_stmt),
opt(contact_stmt),
opt(description_stmt),
opt(reference_stmt),
))(input)
with error:
x = Err(
Error(
Error {
input: "description \"blah blah\";\n}\n",
code: Tag,
},
),
)
permutation doesn't make sense for optional parser, it's only make sense for mandatory present and exclusive but not ordered thing. (so quite limited use case...)
I advice you to use many0
a alt
a map
and an enum
for example:
many0(alt((
map(organization_stmt, MyEnum::Organization),
etc...,
));
One can also use a while
loop instead or many_fold0
and use an hashmap to build the data structure that represent one possible item.
Ok thanks for confirmation.
Random order of optional single instance statements seems to be rather frequent pattern in telco/communications protocols and/or data format grammars, thus "making sense" in user point of view is not that irrational imho.
I realize though why it's not currently supported and what this means on the implementaion level -> it would need extra backtracking and/or some sort of heuristics/dynamic programming patterns to match various possible cases.
When opt statements are 0 to 1 instances, not "manys", workaround requires additonal count check or to assure that at most one instance per each enum variant exists.
I will need to cope with it and come up with some wrapper parser then to address the issue.
In case it's helpful, my workaround for this ended up being explicitly enumerating the possible permutations with help of a macro, since I only had to support four optional and in-any-order items:
In case it's helpful, my workaround for this ended up being explicitly enumerating the possible permutations with help of a macro, since I only had to support four optional and in-any-order items:
very inefficient but it's work
I realize though why it's not currently supported and what this means on the implementaion level -> it would need extra backtracking and/or some sort of heuristics/dynamic programming patterns to match various possible cases.
That simply impossible, how permutation would know when it's should retry, maybe a specialize permutation that "count" option and try to have the much "some" as possible, but again very not efficient
When opt statements are 0 to 1 instances, not "manys", workaround requires additonal count check or to assure that at most one instance per each enum variant exists. I will need to cope with it and come up with some wrapper parser then to address the issue.
That will be hard to make it simple, if I recall, I end up using some unsafe doing a enum manually using hashmap with enum as a key (some sort of discriminant), and a union as value to avoid any overhead, but I advice simply use enum and unwrap if you are not confident for unsafe. Specially for a first try.
Using a hashmap and check if any duplicate exist is *~O(1)
so much better than permutation + alt solution that is O(n²)
(or O(!n)
not sure). But it's allocate, maybe if you know in advance the number of match expected you could use an array, but find the duplicate would be a little longer O(n), not important for small array, problem if you starting to make a giant array.
I guess solution like https://docs.rs/tinymap/0.2.4/tinymap/struct.TinyMap.html + enum should be simple enough.
In case it's helpful, my workaround for this ended up being explicitly enumerating the possible permutations with help of a macro, since I only had to support four optional and in-any-order items:
thanks! yes, i also considered this approach for low count variations - it's nice and easy to implement (of course inneffective due to exhaustive attempts). Unfortunately my ABNF has 43 different "opt-perms" that have between 2 to about 12 mandatory/optional sub-statements in any order... 😱
That simply impossible, how permutation would know when it's should retry, maybe a specialize permutation that "count" option and try to have the much "some" as possible, but again very not efficient
due to nature of problem, i may need to live with lower efficiency and extra allocations etc. will see how the cumbersome peek/map approach goes. It looks like all/most of the cases of my ABNF do have unique leading keyword to allow peeking/parsing/map storing in loop rather directly.
Thanks to all for references and insights!
I'm having a similar issue. I'm trying to parse apache's avro avdl namespace + alias, it can appear likes this:
@namespace("org.apache.avro.someOtherNamespace")
@aliases(["org.old.OldRecord", "org.ancient.AncientRecord"])
or
@aliases(["org.old.OldRecord", "org.ancient.AncientRecord"])
@namespace("org.apache.avro.someOtherNamespace")
but only 1 can appear as well, so the order doesn't matter, but only once, and they cannot be repeated.
I think if I were to use many0
it would imply that they can appear more than once.
I'm gonna try some of the proposed solutions here and update.
I started using https://crates.io/crates/nom_permutation from @Stargateur which seems to work fine for my use case.
Thanks
nom::branch::permutation
does not support optional sub parsersPrerequisites
14.3.1
1.51
Description
I am writing a parser that uses several sub parsers, but the order in which they appear can change, so I thought I should use
nom::branch::permutation
. However, some of the sub-parsers are optional, and as far as I can tell,nom::branch::permutation
does not have a way of marking some of the sub-parsers as optional.The
permutation
macro does support this, but I am not sure if I can use it without also re-writing the sub-parsers with macros. Is there a way this functionality can be made available fornom::branch::permutation
? Or maybe this is already possible and I've just missed something?Test case
Apologies, the following test case is rather contrived. The
permutation
macro supports optional sub parsers, so this is valid:As far as I can tell, optional parsers are not supported with the
permutation
function. I did try it out, just in case, but indeed it does not work: