ollef / Earley

Parsing all context-free grammars using Earley's algorithm in Haskell.
BSD 3-Clause "New" or "Revised" License
361 stars 24 forks source link

Parsing hangs even for simple ambiguous grammars #31

Closed rmanne closed 7 years ago

rmanne commented 7 years ago

Consider the following toy example:

import Control.Applicative
import Text.Earley

data S = S [Char] Char [T] deriving Show
data T = T Char S deriving Show

a, b, c :: Prod r String Char Char
a = satisfy (=='a')
b = satisfy (=='b')
c = satisfy (=='c')

-- "abcb" -> S [a] b [T c (S [] b [])]
-- "abc" -> last 'c' should be unused
-- "abcc" -> parsing all of it without leaving unconsumed pieces should not be possible
-- "abcbcab" -> S [a] b [T c (S [] b [T c (S [a] b)])] OR S [a] b [T c (S [] b []), T c (S [a] b [])]
--s = S <$> many a <*> b <*> many t -- TODO
s = S <$> many a <*> b <*> pure []
t = T <$> c <*> s

main = print (report p "abc") >> print (fullParses p "abc")
    where p = parser $ rule s

As far as I can tell, this grammar should be parsable. Every string only has a finite number of possible parse trees. Yet parsing fails if I enable the commented line (marked with a TODO). Am I doing something wrong? Or is this a bug/limitation in Earley?

ollef commented 7 years ago

Hey, and thanks for getting in touch.

I would encourage you to read this section of the readme. Basically, you have to use rule whenever there's a recursive rule. This is what allows the Earley library to do something slightly more clever than most parser combinator libraries for recursive rules.

Your grammar should work fine if you use something along the lines of:

grammar = mdo
  s <- rule $ S <$> many a <*> b <*> many t
  t <- rule $ T <$> c <*> s
  return s

I hope that helps!

rmanne commented 7 years ago

Thanks! That's exactly what I needed.