ollef / Earley

Parsing all context-free grammars using Earley's algorithm in Haskell.
BSD 3-Clause "New" or "Revised" License
363 stars 24 forks source link

Adding another choice with <|> causes parse to fail #14

Closed massysett closed 8 years ago

massysett commented 8 years ago

I'm writing a parser to parse either 0) a sequence of digits with no grouping characters, or 1) a sequence of digits with grouping characters like commas.

I'm getting some behavior I wouldn't expect, but I'm too ignorant to know if this is a bug. I have tried to boil this down to a simple example, as follows:

{-# LANGUAGE RecursiveDo, RankNTypes #-}
module Main where

import Control.Applicative
import Text.Earley
import System.Environment

data Zero = Zero deriving (Eq, Show)
data Comma = Comma deriving (Eq, Show)

list :: Prod r e t a -> Grammar r (Prod r e t ([a]))
list p = mdo
  r <- rule $ pure [] <|> (:) <$> p <*> r
  return r

comma :: Prod r String Char Comma
comma = Comma <$ symbol ',' <?> ","

zero :: Prod r String Char Zero
zero = Zero <$ symbol '0' <?> "0"

data Grouped = Grouped Zero [Zero] Comma Zero [(Comma, Zero, [Zero])]
  deriving (Eq, Show)

data Ungrouped = Ungrouped Zero [Zero] deriving (Eq, Show)

groupedOrUngrouped :: Grammar r (Prod r String Char (Either Grouped Ungrouped))
groupedOrUngrouped = mdo
  seqZero <- list zero
  seqGroups <- list $ (,,) <$> comma <*> zero <*> seqZero
  let l = Grouped <$> zero <*> seqZero <*> comma <*> zero <*> seqGroups
      r = Ungrouped <$> zero <*> seqZero
  rule $ Left <$> l <|> Right <$> r

grouped :: Grammar r (Prod r String Char Grouped)
grouped = mdo
  seqZero <- list zero
  seqGroups <- list $ (,,) <$> comma <*> zero <*> seqZero
  rule $ Grouped <$> zero <*> seqZero <*> comma <*> zero <*> seqGroups

main :: IO ()
main = do
  arg:[] <- getArgs
  print (allParses (parser groupedOrUngrouped) arg)
  putStrLn ""
  print (allParses (parser grouped) arg)

Here is some test output:

$ ./earleytest.hs 0,0
([(Right (Ungrouped Zero []),1)],Report {position = 1, expected = ["0"], unconsumed = ",0"})

([(Grouped Zero [] Comma Zero [],3)],Report {position = 3, expected = [","], unconsumed = ""})

I would expect that a Grouped value would appear when using the groupedOrUngrouped parser. I don't get a Grouped value there, but I do get one when using the grouped parser alone. Why? Any help is much appreciated. stack enabled source is also at

https://gist.github.com/massysett/b132c660c34955533795

massysett commented 8 years ago

oh - and this library is probably the best context-free parser available for Haskell right now, so thank you!

sboosali commented 8 years ago

your example looks like it should work, but I'm new to chart parsing myself.

first thought, does making l and r into rules help?

On Thursday, November 5, 2015, Omari Norman notifications@github.com wrote:

oh - and this library is probably the best context-free parser available for Haskell right now, so thank you!

— Reply to this email directly or view it on GitHub https://github.com/ollef/Earley/issues/14#issuecomment-154222921.

(this message was composed with dictation: charitably interpret typos)Sam Boosalis

massysett commented 8 years ago

does making l and r into rules help?

No, I tried this and got the same result:

groupedOrUngrouped :: Grammar r (Prod r String Char (Either Grouped Ungrouped))
groupedOrUngrouped = mdo
  seqZero <- list zero
  seqGroups <- list $ (,,) <$> comma <*> zero <*> seqZero
  l <- rule $ Grouped <$> zero <*> seqZero <*> comma <*> zero <*> seqGroups
  r <- rule $ Ungrouped <$> zero <*> seqZero
  rule $ Left <$> l <|> Right <$> r
ollef commented 8 years ago

This looks like a bug.

Something seems to be going wrong with the rule expansion when the rule is nullable and used multiple times at the same position. Here's a minimal example:

g :: Grammar r (Prod r () Char ())
g = mdo
  emptyRule <- rule $ pure ()
  return $ emptyRule <|> emptyRule

I haven't come up with a fix yet, but am looking into it. Many thanks for the report!

ollef commented 8 years ago

This should be fixed now. Thanks again for the report!

massysett commented 8 years ago

I appreciate your work very much, thank you.

On Monday, November 9, 2015, Olle Fredriksson notifications@github.com wrote:

This should be fixed now. Thanks again for the report!

— Reply to this email directly or view it on GitHub https://github.com/ollef/Earley/issues/14#issuecomment-155171833.