kschiess / parslet

A small PEG based parser library. See the Hacking page in the Wiki as well.
kschiess.github.com/parslet
MIT License
805 stars 95 forks source link

Add the ability to map over parse results #159

Closed chrismwendt closed 4 years ago

chrismwendt commented 8 years ago

This adds the ability to map over parse results. This is especially useful for parsing integers into Ruby Fixnums, or composing integers together to form DateTimes.

Unfortunately, the mapping function currently needs to be aware of the internal structure of Parslet's parse tree. That could be fixed, but I imagine it's not easy to do. I'm mostly leaving this here as a proof of concept - I don't expect this to be merged as-is.

kschiess commented 8 years ago

I would like to keep parslet minimal - why don't you create a parslet-extensions gem for this?

The 'one way (tm)' for this that parslet already has is by transforming the output tree, not mapping during the parse phase. This way, you'll map only what is relevant, instead of every succeeded partial tree.

chrismwendt commented 8 years ago

I gave this some thought, and in short, I think map should replace Transform because map is composable and Transform is not.

Imagine you have 2 parsers, date and time, along with their corresponding date_transform and time_transform Transforms for turning them into Date and Time objects, respectively:

date = digit.repeat(4).as(:year) >> str('-') >> ...
time = digit.repeat(2).as(:hour) >> str(':') >> ...

date_transform = Parslet::Transform.new do
  rule({ :year => simple(:year), :month => ... }) { { :date => Date.new(year, month, day) } }
end

time_transform = Parslet::Transform.new do
  rule({ :hour => simple(:hour), :minute => ... }) { { :time => Time.new(hour, minute, second) } }
end

Consider how you would construct a new date_time parser and the corresponding Transform for turning them into DateTime objects. I can think of 2 options:

Option 1: you could write a new Transform with 3 rules in doing so duplicate the implementations of date_transform and time_transform:

date_time = (date >> str(' ') >> time).as(:date_time)
date_time_transform = Parslet::Transform.new do
  rule({ :year => ... }) { ... } # same as before
  rule({ :hour => ... }) { ... } # same as before
  rule({ :date_time => { :date => subtree(:date), :time => subtree(:time) } }) do
    DateTime.new(date.year, date.month. date.day, time.hour, time.minute, time.second)
  end
end

Option 2: or you could take a hybrid approach and apply date_transform and time_transform in sequence, followed by a third transform to construct the DateTime objects:

date_time = date >> str(' ') >> time
date_time_transform = lambda do |ast|
  temp = date_transform.apply(time_transform.apply(x))
  Parslet::Transform.new do
    rule({ :date_time => { :date => subtree(:date), :time => subtree(:time) } }) do
      DateTime.new(date.year, date.month. date.day, time.hour, time.minute, time.second)
    end
  end.apply(temp)
end

Both of these approaches are quite ugly and hacky. Now consider how you would do this with map:

date = (digit.repeat(4).as(:year) >> str('-') >> ...).map(lambda { |ast| Date.new(ast[:year], ...) })
time = (digit.repeat(2).as(:hour) >> str(':') >> ...).map(lambda { |ast| Time.new(ast[:hour], ...) })
date_time = (date.as(:date) >> str(' ') >> time.as(:time)).map(lambda { |ast| DateTime.new(ast[:date].date, ...) })

Done! :tada:

Admittedly, switching from Transform to map would be a foundational change to Parslet. I'm actually not in dire need of this - I just wanted to show you what it might look like in case it sparks some interest :smile:

kschiess commented 8 years ago

Point taken - transforms do not currently compose well. They could (based on internal constructions) though, but that's another PR ;)

Here's what's bothering me with mapping over internal tree values directly:

So I think this approach is doomed for 'core' ;)

How about attaching map blocks to result values that can then be executed once we know what values end up in the 'real' result? A kind of delayed map? I would be favorable to merging an alternative to transformers into parslet, just to give people options - provided safety doesn't suffer.

kschiess commented 8 years ago

Oh and: Thanks for taking an interest and sticking with the discussion. I value your contribution a lot.

chrismwendt commented 8 years ago

Good points about speed and safety, and the "delayed map" idea sounds promising. It reminds me of lazy evaluation in Haskell (from which I'm drawing the inspiration for this mapping ability 😺 ).

It would be slightly inconvenient to make users call some kind of .finalize() method in addition to .parse() in order to get the final results, but I can't think of how else to do it.

kschiess commented 8 years ago

I guess you would need to experiment with the idea to advance this.

kschiess commented 4 years ago

Closing this; original author did not pursue further.