ollef / Earley

Parsing all context-free grammars using Earley's algorithm in Haskell.
BSD 3-Clause "New" or "Revised" License
363 stars 24 forks source link

Odd mixfix behavior #11

Closed mitchellwrosen closed 9 years ago

mitchellwrosen commented 9 years ago

Just curious what you think about this behavior:

When defining an operator if _ _ with right associativity, then

if if x y z 

will be successfully parsed as

if (if x y) z

but

if x if y z

will fail to parse, requiring explicit parentheses.

Doesn't this seem backwards? I would expect the above behavior if the operator had left associativity.

So, does associativity even make sense for operators that are not of the form _a_b_c_d_? Are two Nothings in a row when defining a mixfix operator supposed to be supported?

mitchellwrosen commented 9 years ago

Ah, I think I've found a bug here, one moment while I type up a minimal example.

mitchellwrosen commented 9 years ago
{-# LANGUAGE RecursiveDo #-}

import Control.Applicative
import System.Environment
import Text.Earley
import Text.Earley.Mixfix

data Expr
    = Var String
    | Plus Expr Expr
    deriving (Eq, Ord, Show)

expr :: Associativity -> Grammar r String (Prod r String String Expr)
expr a = mdo
    atomicExpr <- rule $ Var <$> satisfy (/= "+")

    expr <- mixfixExpression
               [[([Just (symbol "+"), Nothing, Nothing], a)]]
               atomicExpr
               (\x y -> case (x,y) of
                  ([Just "+", Nothing, Nothing], [e1,e2]) -> Plus e1 e2)

    return expr

main :: IO ()
main = do
    (x:xs) <- getArgs
    print $ fullParses $ parser (expr (toAssoc x)) $ concatMap words xs
  where
    toAssoc "left"  = LeftAssoc
    toAssoc "right" = RightAssoc
    toAssoc _       = NonAssoc
-- This looks right
$ ./main left + + 5 6 7
([Plus (Plus (Var "5") (Var "6")) (Var "7")],Report {position = 5, expected = [], unconsumed = []})

-- This should not succeed
$ ./main left + 5 + 6 7
([Plus (Var "5") (Plus (Var "6") (Var "7"))],Report {position = 5, expected = [], unconsumed = []})

-- This should not succeed
$ ./main right + + 5 6 7
([Plus (Plus (Var "5") (Var "6")) (Var "7")],Report {position = 5, expected = [], unconsumed = []})

-- This should have succeeded
$ ./main right + 5 + 6 7
([],Report {position = 2, expected = [], unconsumed = ["+","6","7"]})

-- This looks right
$ ./main non + + 5 6 7
([Plus (Plus (Var "5") (Var "6")) (Var "7")],Report {position = 5, expected = [], unconsumed = []})

-- This looks right
$ ./main non + 5 + 6 7
([Plus (Var "5") (Plus (Var "6") (Var "7"))],Report {position = 5, expected = [], unconsumed = []})
mitchellwrosen commented 9 years ago

Heh, after thinking about this for a little while longer, I'm not sure that associativity makes any sense at all for anything but an operator with a hole on the far left and the far right; i.e. "if_thenelse" should have no associativity. Does that sound correct? If so - it should probably be prevented with the type system.

EDIT: And to add to this, I don't believe it makes sense for an operator to have two consecutive holes or two consecutive identifiers, although this isn't clear in the Holey type

ollef commented 9 years ago

Yeah, the mixfix parser is limited to Holey identifiers with alternating holes and may otherwise give unexpected results. I did have a version where Holey prevented anything else and where associativity could only be given to identifiers with holes on both sides, but I deemed it too annoying to work with.

We should at least make this clear in the documentation of the mixfixExpression function. It should be possible to make it work with multiple consecutive holes though. Feel free to have a go at this. Otherwise I'll have a go at it soon (but this week I'm terribly busy).

mitchellwrosen commented 9 years ago

Sure, I can have a go at it. I was thinking of something like:

data Operator r e t ident expr where
    -- ++_, if_then_else_, etc.
    Prefix  :: Vector (Prod r e t ident) n 
               -> (Vector expr n -> expr) 
               -> Operator r e t ident expr
    -- _++, _[_], etc.
    Postfix :: Vector (Prod r e t ident) n 
               -> (Vector expr n -> expr) 
               -> Operator r e t ident expr
    -- _+_, _if_else_, etc
    Infix   :: Vector (Prod r e t ident) (S n) 
               -> (Vector expr (S (S n)) -> expr) 
               -> Associativity 
               -> Operator r e t ident expr
    -- [[_]], (_), etc
    Closed  :: Vector (Prod r e t ident) (S n) 
               -> (Vector expr n -> expr) 
               -> Operator r e t ident expr

-- replaces mixfixExpression
buildExpressionGrammar
    :: [[Operator r e t ident expr]]
    -> Prod r e t ident
    -> Grammar r e (Prod r e t expr)

Was this the same as your first version, which was too annoying to work with? For me, it doesn't seem that bad. Example:

import Data.Sized.Vector (Vector(..))
import qualified Data.Sized.Vector as V

data Expr
    = Var String -- "5"
    | Plus Expr Expr -- "5+5"
    | Negate Expr -- "-5"
    | If Expr Expr Expr -- "_if_else_"

buildExpressionParser
    [ [ Infix (V.singleton (symbol "+") (\(e1 :- e2 :- Nil) -> Plus e1 e2) LeftAssoc ]
    , [ Prefix (V.singleton (symbol "-") (\(e :- Nil) -> Negate e) ]
    , [ Infix (symbol "if" :- symbol "else" :- Nil) (\(e1 :- e2 :- e3 :- Nil) -> If e1 e2 e2) RightAssoc ]
    ]
    varExpression

As usual apologies if any of this doesn't typecheck as I just wrote it from scratch :P

ollef commented 9 years ago

The way I did it was just using normal ADTs, but similar in spirit. If we can get it working for consecutive holes much of that machinery would however be unnecessary. I would prefer that. :)

mitchellwrosen commented 9 years ago

Ok, so there are sort of two issues I'm seeing. One, consecutive holes seem to not work properly with the existing mixfixExpression. And two, the function is not very type safe (the user has no assurance that the Holey ident -> [expr] -> expr corresponds to any of the operators defined).

Just to clarify, are you saying that you want to rectify the first problem, but would rather not pursue some more radical rethinking of the types, in favor of simplicity and ease of use?

ollef commented 9 years ago

Yeah, that's right. The first problem is more important IMO.

I think what's desirable for the second one depends on the usecase. If you statically know the identifiers like in your example, then the extra assurance and flexibility in choosing semantic actions separately for each identifier is nice. If you're generating a parser from a user-given table, then the current solution is probably easier to use since you often want the same semantic action for everything anyway, and dependent types would only get in the way. I'm not sure what the best course of action is.

mitchellwrosen commented 9 years ago

Ok, sounds good. So, here's another issue which might be related to implementing consecutive holes:

What does it mean for an operator without holes on both sides to have an associativity? For example, even just simplifying a consecutive-hole operator like if _ _ to just ~ _, wouldn't you always want to parse ~ ~ x as ~ (~ x), and therefore prefix/postfix operators always have "right"/"left" associativity, respectively?

ollef commented 9 years ago

Yeah, that's probably the right idea. You could permit e.g. ~_ to be left-associative but that would mean that ~ ~ x doesn't parse. I'm not sure if that's useful behaviour.

mitchellwrosen commented 9 years ago

Ok, and then does that idea extend to mixfix operators that only have a hole on one side? i.e. if_then_else_ is essentially a prefix operator that should be right associative, and if it was given left associativity, if a then b else if c then d else e shouldn't parse? (I haven't double checked what the current behavior of mixfixExpression is)

ollef commented 9 years ago

I think that is the current behaviour, and it also extends to e.g. if_then_else_. What's tricky to think of is associativity for operators that are non-alternating: What does it mean for if__ to be left- or right-associative? Here you could give two associativities meaning.

mitchellwrosen commented 9 years ago

That's right, and I think it might make some sense to treat inner holes the same as if they were surrounded by tokens: they're just top-level expressions. So, if _ _ should always be right-associative, like other prefix expressions, and the first _ is any expression whatsoever.

ollef commented 9 years ago

I believe this is fixed now. Please reopen the issue if it doesn't work. :)