Retrieving entire match for a rule

kschiess / parslet

A small PEG based parser library. See the Hacking page in the Wiki as well.

kschiess.github.com/parslet

MIT License

805 stars 95 forks source link

Retrieving entire match for a rule #125

Closed michaelmior closed 9 years ago

michaelmior commented 9 years ago

I have a reasonably complex parser where I want to construct a parse tree for some portion of the input string, but also be able to retrieve the entire input string for that subtree.

This example should illustrate my point:

class MyParser < Parlset::Parser
  rule(:foo) { ... }
  rule(:bar) { str('baz') << foo.as(:foo) }
  root :bar
end

After parsing using the rule :bar, is there any way to get the portion of the input which matched rule :foo? My output looks like {foo: {...}} where the value at the :foo key is a simple hash. Is it possible to modify the parser to do this? Any help appreciated :)

kschiess commented 9 years ago

You'd have to look at all slice positions in parts of :foo and take their minimum and their maximum. That would give the extent of the useful content of :foo.

To really do what you want, I guess we'd have to modify parslet to not return Hash/Array, but to return subclasses that also store position, as does Slice. There's a tradeoff between this and speed however...

In practice, I often tweak my rules to capture relevant strings and then work with the input positions of those strings.

kschiess commented 9 years ago

Did my answer help you? Can you close this ticket or turn it in a feature request where you say specifically how we should change parslet?

michaelmior commented 9 years ago

Sorry for the lack of a reply. This hasn't really solved my problem. The particular part of the string being parsed was just easier to match using a regex. It would be interesting to consider subclassing the returned data though as you mentioned. It could always be optional if performance is an issue. Closing this for now though.