kschiess / parslet

A small PEG based parser library. See the Hacking page in the Wiki as well.
kschiess.github.com/parslet
MIT License
805 stars 95 forks source link

Add a Mark atom #87

Closed dongli closed 11 years ago

dongli commented 11 years ago

The 'Mark' atom is used to match input but not consume it.

kschiess commented 11 years ago

I fail to see the differences to #absent? and #present?. Did you see Florian's remarks? Mostly, this solves a problem I don't recommend having. Also, your use case of knowing where lines start and end can be solved without this.

I am very hesitant to include something that looks like a hack to me in parslet. Please explain better why we would need this. And the emphasis is on we - I can see why you would need it in your code. That is where it belongs for now, unless you convince me.

kschiess commented 11 years ago

It might be better to discuss things like these on the mailing list first.

dongli commented 11 years ago

Hi @kschiess ,

For absent and present, they just judge whether or not a pattern is matched, and return nil! So we get nothing from them, but I would like to get the location information without bothering other atoms like:

( any.mark.as(:start) >> str('a').as(:a) >> any.mark(-1).as(:end) ).parse('a')

then we will get:

{:start=>"a"@0, :a=>"a"@0, :end=>"a"@0}

It is easy to get the bounding positions of a by using start and end. These positions will be very useful when processing the codes after parsing. Does this make sense?

Cheers,

Li

kschiess commented 11 years ago

Then why don't you just take the location information on :a?

This still looks like a code organisation issue and not a real feasibility issue. One of my goals in maintaining parslet is to keep its API surface as small as possible. This seems to violate that principle.

dongli commented 11 years ago

Well, the above might be too simple to explain my purpose, let we see this Fortran code:

type, extends(list_elem_t<foo_t>) :: foo_t
...
end type foo_t

when processing, I might need to reconstruct the definition of foo_t (some kind of template operation), so I need the first and last line number of it. You may say we can just take the location of the first type and the last foo_t. For type, when we use as(:start), it will curtain the potential name of type, and for foo_t, programmer may just omit it (it is ok in Fortran), so we actually need to use the location of the last type, but if foo_t is not omitted, and it is typed in the next line like:

type, extends(list_elem_t<foo_t>) :: foo_t
...
end type &
foo_t

or even:

type, extends(list_elem_t<foo_t>) :: foo_t
...
end

we can't depend on the location of the last type, neither the end. Frankly, the free style of Fortran really drives me crazy, but we just need it in numerical computing field. So I came to the thought of Mark.

kschiess commented 11 years ago

And recursing over the intermediary tree would not work?

# For example, to get the starting offsets of a whole subtree, use
#   int_tree_map(exp, &:offset).flatten.min
#
# here's the same thing for the ending offset: 
#   int_tree_map(exp, &:offset).flatten.max
#
# The #flatten is needed because array structure returned will reflect
# tree nesting. 
#
def int_tree_map tree, &element_operation
  case tree
    when Hash
      int_tree_map tree.values, &element_operation
    when Array
      tree.map { |e| int_tree_map(e, &element_operation) }
  else
    element_operation.call(tree)
  end
end

Still seems to me like something that your parser needs specifically, not like a common thing to do. Here's my full list of criterions for inclusion to parslet: Something gets included if: a) it makes parsers easier to write in general, b) it's not trivial to implement and c) can't be done using existing features.

Your proposition currently fails all of the above as I see it.

Also, error reporting will break with this, errors will get reported on the 'MARK' atom, which is rather useless. Inspection could do more than just returning 'MARK' - we're talking about a whole subtree of atoms here. Also, a new atom needs to have visitor methods for it.

I am not merging this. Also, please discuss proposed features on the mailing list first, before using the issues system.