kschiess / parslet

A small PEG based parser library. See the Hacking page in the Wiki as well.
kschiess.github.com/parslet
MIT License
809 stars 95 forks source link

Parser array output #114

Closed michaelmior closed 10 years ago

michaelmior commented 10 years ago

It would be nice if it were possible to tell parslet that I always want something to be an array, even if there's only one element. The two rules below are extracted from the getting started example.

rule(:arglist)    { expression >> (comma >> expression).repeat }
rule(:funcall)    { identifier.as(:funcall) >> lparen >> arglist.as(:arglist) >> rparen }

If arglist contains multiple expression, then arglist parsed in the context of funcall will be an array. However, if it only contains one expression, then it will be a hash. This means additional logic to check whether one or many things were parsed.

It would be nice if this could be handled a bit more automatically. One possibility is something like

arglist.as_array(:arglist)

This would check if parsing arglist results in only a single element and then wrap it in an array. It also keeps backwards compatibility. I think I'll probably start working on a patch, but let me know if this is something you would be interested in accepting or if there's a better way to do the same thing.

floere commented 10 years ago

One way to solve this would be to use a transformer to transform single arguments into an argument list, like so:

require 'parslet'

class Mini < Parslet::Parser
  rule(:argument) { match('[a-z]').repeat.as(:argument) }
  rule(:arglist) { argument >> (str(',') >> argument).repeat }
  rule(:funcall) { arglist.as(:arglist) }
  root(:funcall)
end

multiple = Mini.new.parse("abc,def")
single   = Mini.new.parse("abc")

transformer = Parslet::Transform.new do
  rule(:arglist => { :argument => simple(:arg) }) { { :arglist => [{ :argument => arg }] } }
end

p transformer.apply multiple # => {:arglist=>[{:argument=>"abc"@0}, {:argument=>"def"@4}]}
p transformer.apply single # => {:arglist=>[{:argument=>"abc"@0}]}
michaelmior commented 10 years ago

That's true. Although it also means that any code using that parser needs to apply a particular transformer. For me it feels like a common enough use case to push down to the parser. The exact problem has resulted in very verbose code when I've used other parsing frameworks for Ruby. It would be nice to have a simple solution.

michaelmior commented 10 years ago

I whipped up something quickly at michaelmior/parslet@fb595e611e824a6db4fc4f54d4f4a62ddd85b29c. It will break if each atom in the value is also an array. I'm sure there's a way to work around this with a better understanding of the unflattened atom structure.

kschiess commented 10 years ago

I know this is possible in code, but parslet already offers transformers. I try to keep things down to a bare minimum (or my definition of it), so thanks but no. For everyone, approaches I recommend here are:

a) A transformer, because it ships with parslet b) Including code like pasted above into your codebase and just enhancing parslet with it

And - the array/singleton hickup was a design choice. Main reason why it stays in.

michaelmior commented 10 years ago

@kschiess Fair enough. Although I wasn't suggesting changing existing behaviour, just a simple addition of a few lines to handle this use case which seems fairly common to me. But I respect the minimalism and transformers are also a decent way to handle this :)

JESii commented 8 years ago

Well... I tried @floere suggestion above but have not been able to get it to work.

I have a parsed element: {:mc=>{:spid=>"17560"@0}, :bc=>{:spid=>"16699"@7}} and want to transform each of these single elements into an array.

I have the transform: rule(:mc => { :spid => simple(:arg) } ) { { :mc => [ { :spid => arg } ] } } (mc only for testing right now), but all I get back is {:mc=>{:spid=>"17560"@0}, :bc=>{:spid=>"16699"@7}}.

michaelmior commented 8 years ago

@JESii Transforms have to match the entire subtree. Your rule only matches the key :mc but the parse tree also has the key :bc. You would need to have your rule match a parse tree with both of those keys. You could then turn both into an array at the same time.

JESii commented 8 years ago

Thanks, but I'm still having problems: The parse tree looks like this (same as before): {:mc=>{:spid=>"17560"@0}, :bc=>{:spid=>"16699"@7}} So I copied that parse tree, transmogrified it into a rule and just tried to get the transform rule to match: rule({:mc=>{:spid=>simple(:argm)}, :bc=>{:spid=>simple(:argb)}}) { 'abc' } Unfortunately, that still doesn't match and just passes through un-transformed.

And even more unfortunately, I will now have to have more transform rules, as I can get parse trees where :mc is already an array or :bc is already an array; like this... {:mc=>{:spid=>"12345"@0}, :bc=>[{:pid=>"17560"@8}, {:pid=>"17899"@15}]} It's the nature of the input I'm dealing with -- I want to get an array back out for both :mc and :bc, even they have only one element.

JESii commented 8 years ago

OK; figured it out...

I had already identified 'single' entries versus 'multiple' entries with the ":spid" versus ":pid" name. So, all that was required were the two rules:

  rule(:pid => simple(:x)) { Integer(x) }
  rule(:spid => simple(:x)) { [ Integer(x) ] }