kschiess / parslet

A small PEG based parser library. See the Hacking page in the Wiki as well.
kschiess.github.com/parslet
MIT License
809 stars 95 forks source link

Hanging Parser #65

Closed mgsnova closed 12 years ago

mgsnova commented 12 years ago

Hi,

I stumbled onto a problem with the parser hanging endlessly. I can reduce it to the following code example:

require 'parslet'
require 'pp'

class Parser < Parslet::Parser
  root :statements
  rule(:statements) { statement.repeat }
  rule(:statement) { text.as(:t1) >> space >> text.as(:t2) >> separator }
  rule(:text) { match('[a-z]').repeat }
  rule(:space) { match('\s').repeat } 
  rule(:separator) { eof | line_break }
  rule(:line_break) { match('[\r\n]').repeat }
  rule(:eof) { any.absent? }
end

parser = Parser.new

pp parser.statement.parse("first second")
 # => {:t1=>"first"@0, :t2=>"second"@6}

pp parser.parse("first second")
# => this one causes the parser to hang

pp parser.parse("first second\ntest test")
# => this one causes the parser to hang

Am I doing something wrong? Is this a bug of the parser? This behaviour occurs with parslet version 1.3.0 on ruby 1.9.3p125

Thanks

kschiess commented 12 years ago

As far as I can see, the language you're describing has a possibly infinite prefix of invisible zero-length characters. And an postfix as well. Your parser is busy parsing that postfix and will get to the real text as soon as it's done.

Or put more concretely: Nothing in your :statement rule says that it cannot be of zero length. And then detecting :statements in the input stream means consuming possibly many zero length items. This is your loop, right there.

Here's what I'd change:

...
rule(:text) { match('[a-z]').repeat(1) }
...

This does nothing more but asking for text to be a non-empty sequence of characters.

As a question of strategy, posting such things to our mailing list will get you farther quicker - more people answering there than here.