kschiess / parslet

A small PEG based parser library. See the Hacking page in the Wiki as well.
kschiess.github.com/parslet
MIT License
809 stars 95 forks source link

Possible bug with combination of as(...) and repeat(...) #111

Closed tallakt closed 10 years ago

tallakt commented 10 years ago
# encoding: utf-8
require 'parslet'
require 'parslet/convenience'
require 'awesome_print'

class Parser < Parslet::Parser
  rule :newline do
    str "\n"
  end

  rule :comments do
    # I suspect a problem occurs here as there might be several :comment
    # generated. 
    comment >> (newline >> comment).repeat
  end

  rule :comment do
    # The code will return three :main sections if this line is used
    # str("//") >> (newline.absent? >> any).repeat
    str("//") >> (newline.absent? >> any).repeat.as(:comment)
  end

  rule :main do
    (comment.absent? >> any).repeat.as(:main)
  end

  rule :root do
    (main >> comments >> newline).repeat >> newline.repeat
  end
end

document = <<EOF
MAINA //COMMENTA
MAINB //COMMENTB
//COMMENTC
MAIND //COMMENT
EOF

tree = Parser.new.parse_with_debug document
# MAINB is missing from the tree
ap tree
kschiess commented 10 years ago

During the result merge phase, parslet encounters this:

[{:main=>"MAINA "@0, :comment=>"COMMENTA"@8}, [{:main=>"MAINB "@17}, {:comment=>"COMMENTB"@25}, {:comment=>"COMMENTC"@36}]]

But a repetition can either contain all hashes or all arrays, with hashes being given the preference. This drops your second line result.

Rephrasing your :comments rule to

  rule :comments do
    # I suspect a problem occurs here as there might be several :comment
    # generated. 
    comment >> (newline >> comment).repeat.as(:others)
  end

preserves the result in every case. The thing with .as(...) is: You need to name all the things, everywhere. Don't try to get a minimal output, try to get all your results.

This is not a bug, but a really devious edge case.

tallakt commented 10 years ago

It would be nice if there was a warning of sorts, that was what i was thinking...

kschiess commented 10 years ago

A warning about normal operation? What would be the wording?

tallakt commented 10 years ago

Warning: Duplicated tag name in rule comments, capture groups may be lost

;-)

The way I see it doing what I did is a bug... it would be a nice feature if parslet would inform you that you have done something wrong.

2014-08-19 9:02 GMT+02:00 Kaspar Schiess notifications@github.com:

A warning about normal operation? What would be the wording?

— Reply to this email directly or view it on GitHub https://github.com/kschiess/parslet/issues/111#issuecomment-52597231.

kschiess commented 10 years ago

Warning: Duplicated tag name in rule comments, capture groups may be lost

The problem is about giving preference to Hashes over Arrays when flattening results. This is normal operation. Warning about it is probably silly - maybe we should explain flattening better?

Anyhow - the general advice here is: you almost never have enough .as(...) ;)

k