Closed kernow closed 9 years ago
Your example is a bit confusing as it says "Fails to match" for both [xxx]
cases. So the second set of cases all match?
Sorry, I cut and paste too much and didn't check it properly. I've edited the original comment now. In the first example xxx and aaa fail to match, in the second example everything matches.
What was your expectation?
Oh, note that you can use #cause
on the ParseFailed
to get more info:
begin
parser.open.parse('[xxx]') # fails to match
rescue Parslet::ParseFailed => pf
require 'pp'
pp pf.cause
end
I would expect all of the examples to match no matter what order the strings are defined within the :block_tag
rule. I've used the cause but it's not making any sense to me as it's saying it's found " ," which isn't in any of the strings.
#<Parslet::Cause:0x007fcea3a21970
@children=
[#<Parslet::Cause:0x007fcea3a21a88
@children=nil,
@message=["Expected \"]\", but got ", "x"@3],
@pos=#<Parslet::Position:0x007fcea3a21b78 @bytepos=3, @string="[xxx]">,
@source=
#<Parslet::Source:0x007fcea3a22d70
@line_cache=#<Parslet::Source::LineCache:0x007fcea3a22ca8 @line_ends=[]>,
@re_cache={1=>/(.|$){1}/m, 2=>/(.|$){2}/m},
@str=#<StringScanner 0/5 @ "[xxx]">>>],
@message="Failed to match sequence ('[' BLOCK_TAG ']')",
@pos=#<Parslet::Position:0x007fcea3a21998 @bytepos=3, @string="[xxx]">,
@source=
#<Parslet::Source:0x007fcea3a22d70
@line_cache=#<Parslet::Source::LineCache:0x007fcea3a22ca8 @line_ends=[]>,
@re_cache={1=>/(.|$){1}/m, 2=>/(.|$){2}/m},
@str=#<StringScanner 0/5 @ "[xxx]">>>
Note that the issue only occurs when one sting is a substring of the other, e.g. 'xx' is a substring of 'xxx' and when the smaller string is defined in the rule before the larger, e.g. 'xx' before 'xxx'. When distinct strings are used e.g. aa, bb, cc, dd, ee etc. the issue is not present.
In a PEG, order matters. The choice operator |
is a prioritized choice operator, which means that it will consume the first successful match and move on. That is why there is an x
remaining where it expects a ]
.
This gives you more power over which parse tree is selected, but you need to be more careful about the prioritisation. In your case, the more specific xxx
should be evaluated before the xx
since the xx
is contained in the xxx
.
Here (http://www.brynosaurus.com/pub/lang/peg.pdf) is a nice paper on PEGs.
I hope that helps :)
Thanks for the explanations @floere, that makes sense. What would be the best way to support the list of block_tags defined in any order? These rules are defined by users in my use case and so could be listed in any order. I guess ordering the list by length with the longest first before applying them to the rule might work?
If those rules are always of the class str('…')
, with the users' strings inserted in place of the …
, then ordering these str
rules by decreasing length would work.
@kernow My pleasure 😊
I'm not sure if this is a bug or that the rules are incorrect.
If xx comes before xxx in the
block_tag
rule then[xxx]
fails to match in theopen
rules, where as if it comes after then[xxx]
is matched.