kschiess / parslet

A small PEG based parser library. See the Hacking page in the Wiki as well.
kschiess.github.com/parslet
MIT License
805 stars 95 forks source link

How to make some matches not appear in the tree? #86

Closed dongli closed 11 years ago

dongli commented 11 years ago

Hi all,

I would like to make some matches not appear in the tree, since they are trivial and may cause duplicate keys. For example, in Fortran we have derived type:

type foo_t
end type foo_t

The second foo_t should be dropped.

The present parsed tree is:

{
    :derived_type_definition => {
        :type_name => {:id=>"foo_t"@5},
        :new_line=>"\n"@10,
        :declarations=>[],
        :id=>"foo_t"@20
    }
}

The last :id=>"foo_t"@20 is redundant. How could I make it disappear without duplicate key warning?

roryokane commented 11 years ago

It would be easier to answer if you included the code for your parser, with the rules that match the input text and turn them into that tree. That is, the code in your class FortranParser < Parslet::Parser.

The solution might be as simple as omitting .as(:id) from your closing match of foo_t.

dongli commented 11 years ago

Hi @roryokane ! You can clone the gist (https://gist.github.com/dongli/5791976) and run:

    rspec --fail-fast rspec_fortran_parser.rb

See the last test. The rule is derived_type_definition.

floere commented 11 years ago

First of all, kudos for writing this parser :) As @roryokane says, it might be as simple as omitting .as(:id) – however, in your parser that is buried deeply on line 203: spaced(( match('[a-zA-Z_]') >> match('[a-zA-Z0-9]').repeat >> match('[a-zA-Z0-9_]').repeat ).as(:id)) Usually it is best to not leave as statements as deeply embedded, and have them describe more complex semantic structures. However, not sure if this is possible in your case. Maybe you can push the as(:id) away from the id node/rule?

Just to get you onto the idea of transforms, it is completely ok to transform the parse tree into another parse tree using transformers. For example, to get what you want, you could match only the case you describe and eliminate the id:

class Trans < Parslet::Transform
  rule(:type_name => { :id => simple(:a) }, :new_line => simple(:b), :declarations => sequence(:c), :id => simple(:x)) do
    { :type_name => { :id => a }, :new_line => b, :declarations => c }
  end
end

Then run it on your tree:

tree = Trans.new.apply tree

However, I'm not saying to use it in this case – just as a pointer that transforms can well be used on a parse tree to get a slightly pruned parse tree.

But usually (always?), solutions can be found in restructuring the parser.

dongli commented 11 years ago

Hi @floere ,

I will definitely use Transform, but I think it would be convenient to have a hide method or some thing like that for such purpose (maybe a lot of typings will be saved).

I put as(:id) deeply there in order to distinguish from as(:template_instance) from very beginning, since I would like to add a handy template mechanism in Fortran.

BTW, I have almost achieved my goal with Treetop, but would like give Parslet a try. : )

roryokane commented 11 years ago

You can create your own hide method. First, write a Parlset atom that “forgets” the name assigned by as:

class Anonymized < Parslet::Atoms::Base
  attr_reader :parslet
  def initialize(parslet)
    super()

    @parslet = parslet
  end

  def apply(source, context, consume_all)
    success, value = result = parslet.apply(source, context, consume_all)

    return result unless success
    succ(
      produce_return_value(
        value))
  end

  def to_s_inner(prec)
    "hidden(" + parslet.to_s(prec) + ")"
  end
private
  def produce_return_value(val)
    flatten(val, true).first[1]
  end
end

Anonymized was modeled after Parslet::Atoms::Named, the Parslet atom behind as().

To get a hide helper method, you can either define a plain Ruby method:

def hide(parslet)
  Anonymized.new(parslet)
end

or a Parslet DSL method (like the ones in dsl.rb):

module Parslet::Atoms::DSL
  def hide
    Anonymized.new(self)
  end
end

And here’s how you can use your new hide method in IRB:

>> require 'parslet'
=> true
>> # copy and paste `Anonymized` and `hide` here
>> include Parslet
=> Object
>> a = str('a')
=> 'a'
>> named_a = a.as(:a)
=> a:'a'
>> hidden_a = named_a.hide # using the DSL version of hide
=> hidden(a:'a')
>> a.parse('a')
=> "a"@0
>> named_a.parse('a')
=> {:a=>"a"@0}
>> hidden_a.parse('a')
=> "a"@0

Here’s how you can use Anonymizer to fix derived_type_declaration in your FortranParser so it works with your example:

    rule(:derived_type_declaration) {
        ( keyword('type') >> derived_type_attributes.maybe >> template_or_id.as(:type_name) >> new_line >>
              declarations >>
              tbp_declarations.maybe >>
          keyword('end').as(:end) >> keyword('type') >> template_or_id ).as(:derived_type_declaration)
    }

That changes your parse tree for the last RSpec test from

{:derived_type_declaration=>{:type_name=>{:id=>"foo_t"@5}, :new_line=>"\n"@10, :declarations=>[], :end=>"end"@11, :id=>"foo_t"@20}}

to

{:derived_type_declaration=>{:type_name=>{:id=>"foo_t"@5}, :new_line=>"\n"@10, :declarations=>[], :end=>"end"@11}}

hide will hide any name, not just :id – it will hide :template, which may not be what you want. If you want to hide only :id, you can try extending Anonymized so that it takes a name as parameter, like just like Parslet::Atoms::Named does, and only hides that name. Or perhaps at that point, it would be better to use a Transformation than a custom atom.

There might be better names than Anonymized and hide; I chose the first names that came to mind.

kschiess commented 11 years ago

The duplicate key warning is easy to remove - often you need just one more .as() - introducing structure that disambiguates two subhashes.

rule(:id) { ... }  # .as(:id) in here
rule(:foo) { id.as(:id1) >> id.as(:id2) }

But of course hiding works just as well. If you want to contribute this to parslet (on the odd chance), please rename it to 'ignore' and Ignore.

Closing this issue, since this is also something that is better discussed on the mailing list.