kschiess / parslet

A small PEG based parser library. See the Hacking page in the Wiki as well.
kschiess.github.com/parslet
MIT License
805 stars 95 forks source link

repeat.as outputs [] for empty string input #126

Open hagabaka opened 9 years ago

hagabaka commented 9 years ago

Normally when you use as on repeat, the matched string will be in the result hash:

str('a').repeat.as(:b).parse('aaaaa')
# => {:b=>"aaaaa"@0}

However when the input string is empty, which repeat accepts by default, the result will have an empty array instead:

str('a').repeat.as(:b).parse('')
# => {:b=>"[]"}

This inconsistency makes it hard to write transform rules, because "simple" only matches if the input string is non-empty, and "sequence" only matches if the input is empty:

transform = Transform.new {rule(:b => simple(:b)) {b}}
transform.apply str('a').repeat.as(:b).parse('aaaaa')
# => "aaaaa"@0
transform.apply str('a').repeat.as(:b).parse('')
# => {:b=>[]}

transform = Transform.new {rule(:b => sequence(:b)) {b.join}}
transform.apply str('a').repeat.as(:b).parse('aaaaa')
# => {:b=>"aaaaa"@0}
transform.apply str('a').repeat.as(:b).parse('')
# => ""

Of course if the subtree is as simple as {:b => '...'} or {:b => []}, another transform rule can normalize them. But if there are multiple keys in the subtree, it would be tedious to write that rule. Is there a reason why the parser shouldn't just output empty string for repeat.as when the input is empty?

kschiess commented 9 years ago

Yes there is. It becomes apparent when you do something like this:

str('a').as(:a).repeat.as(:b).parse('aaaaa')

However, I would consider a second (third/last) argument to repeat to specify whether an empty match should result in a nil or in a [] - parslet can't know really without explicit indication. How would you like that?

# Fantasy code ahead: 
str('a').repeat(no_match: nil).as(:b).parse('aaaaa')
rubydesign commented 9 years ago

Just want to say that i also have some rules that i would like to clean up, remove the duplication. +1 as it were, and the suggestion sounds good.

hagabaka commented 9 years ago

str('a') is a Parslet::Atoms::Str, while str('a').as(:a) is a Parslet::Atoms::Named. Could repeat automatically determine its as output for empty input based on this difference?

kschiess commented 9 years ago

If you add multiple layers of Entity, Sequence, ... on top, you wont be able to tell.

kschiess commented 7 years ago

I've thought about this and see the opportunity for improvement now. I'll execute your last idea as soon as I get to it.

smackesey commented 4 years ago

Is there any plan to implement this? This issue is old but looks like there has been some recent activity on the repo. I agree with everything @hagabaka said above.