kschiess / parslet

A small PEG based parser library. See the Hacking page in the Wiki as well.
kschiess.github.com/parslet
MIT License
809 stars 95 forks source link

How to get the offset of matches in input string? #84

Closed dongli closed 11 years ago

dongli commented 11 years ago

Hi,

I would like to get the offset of the matches in input string, so that I can operate on the input string based on what have been parsed. In Treetop, we get interval method. So what is the counterpart in Parslet? I see slice and source classes that contain offset, but there is no example or doc on that.

floere commented 11 years ago

Not sure what the counterpart is (@kschiess knows better), but here are two helpful methods:

require "parslet"

class TestParser < Parslet::Parser
    rule(:test) {
        str('a') >> str('b') | str('c')
    }
end

p TestParser.new.test.parse('ab').offset
p TestParser.new.test.parse('c').line_and_column

However, Parslet provides transformers to operate on the matched inputs. Read more here: http://kschiess.github.io/parslet/transform.html

Cheers!

dongli commented 11 years ago

@floere Thanks, I will check out Transform.

kschiess commented 11 years ago

The documentation is here: http://rubydoc.info/gems/parslet/Parslet/Slice Parslet's code documentation is also maintained.

dongli commented 11 years ago

@kschiess Thanks, I will check it out after finishing basic parsing.

dongli commented 11 years ago

I have another question related to offset. I would like to write rules to match line start and line end, so I can know what is the line range of a match as:

rule(:procedure) {
    line_start.as(:start) >> ... >> line_end.as(:end)
}

When transforming, the line range is

tree[:start].line_and_column[0]..tree[:end].line_and_column[0]

My imaginary line_start and line_end rules might be

rule(:line_start) {
    ( new_line.maybe >> ( space | character ) ).mark
    #                                       ^^^^
}

rule(:line_end) {
    ( ( space | character ) >> new_line.maybe ).mark
    #                                       ^^^^
}

So the QUESTION is how to not consume input (imaginary mark).

dongli commented 11 years ago

Hi all,

I have written a new atom Mark, which can accomplish my previous quest:

# Make a mark, return the match as usual, but doesn't consume its input.
#
# Example:
#
#   str('foo').mark # matches when the input contains 'foo', but doesn't
#   consume the input, so other atoms have chance to parse the input.
#
class Parslet::Atoms::Mark < Parslet::Atoms::Base
  attr_reader :mark_parslet
  attr_reader :offset

  def initialize(mark_parslet)
    super()

    @mark_parslet = mark_parslet
    @offset       
  end

  def try(source, context, consume_all)
    # Record the position
    source.pos += offset
    old_pos = source.pos
    success, value = result = mark_parslet.apply(source, context, consume_all)
    # Reset the position
    source.pos = old_pos-offset
    return result
  end
end

Please check if it is OK. Thanks!

My test is:

require "parslet"

class TestMark < Parslet::Parser
    rule(:test_rule_1) {
        str('a').mark.as(:mark) >> str('a').as(:real)
    }

    rule(:test_rule_2) {
        str('a').as(:real) >> str('a').mark(-1).as(:mark)
    }
end

p TestMark.new.test_rule_1.parse('a')
p TestMark.new.test_rule_2.parse('a')

and the result is:

{:mark=>"a"@0, :real=>"a"@0}
{:real=>"a"@0, :mark=>"a"@0}
kschiess commented 11 years ago

This should really go to the mailing list.

dongli commented 11 years ago

Sorry, @kschiess . I just miss the syntax highlighting~