Closed mitchellwrosen closed 9 years ago
Ah, I think I've found a bug here, one moment while I type up a minimal example.
{-# LANGUAGE RecursiveDo #-}
import Control.Applicative
import System.Environment
import Text.Earley
import Text.Earley.Mixfix
data Expr
= Var String
| Plus Expr Expr
deriving (Eq, Ord, Show)
expr :: Associativity -> Grammar r String (Prod r String String Expr)
expr a = mdo
atomicExpr <- rule $ Var <$> satisfy (/= "+")
expr <- mixfixExpression
[[([Just (symbol "+"), Nothing, Nothing], a)]]
atomicExpr
(\x y -> case (x,y) of
([Just "+", Nothing, Nothing], [e1,e2]) -> Plus e1 e2)
return expr
main :: IO ()
main = do
(x:xs) <- getArgs
print $ fullParses $ parser (expr (toAssoc x)) $ concatMap words xs
where
toAssoc "left" = LeftAssoc
toAssoc "right" = RightAssoc
toAssoc _ = NonAssoc
-- This looks right
$ ./main left + + 5 6 7
([Plus (Plus (Var "5") (Var "6")) (Var "7")],Report {position = 5, expected = [], unconsumed = []})
-- This should not succeed
$ ./main left + 5 + 6 7
([Plus (Var "5") (Plus (Var "6") (Var "7"))],Report {position = 5, expected = [], unconsumed = []})
-- This should not succeed
$ ./main right + + 5 6 7
([Plus (Plus (Var "5") (Var "6")) (Var "7")],Report {position = 5, expected = [], unconsumed = []})
-- This should have succeeded
$ ./main right + 5 + 6 7
([],Report {position = 2, expected = [], unconsumed = ["+","6","7"]})
-- This looks right
$ ./main non + + 5 6 7
([Plus (Plus (Var "5") (Var "6")) (Var "7")],Report {position = 5, expected = [], unconsumed = []})
-- This looks right
$ ./main non + 5 + 6 7
([Plus (Var "5") (Plus (Var "6") (Var "7"))],Report {position = 5, expected = [], unconsumed = []})
Heh, after thinking about this for a little while longer, I'm not sure that associativity makes any sense at all for anything but an operator with a hole on the far left and the far right; i.e. "if_thenelse" should have no associativity. Does that sound correct? If so - it should probably be prevented with the type system.
EDIT: And to add to this, I don't believe it makes sense for an operator to have two consecutive holes or two consecutive identifiers, although this isn't clear in the Holey
type
Yeah, the mixfix parser is limited to Holey
identifiers with alternating holes and may otherwise give unexpected results. I did have a version where Holey
prevented anything else and where associativity could only be given to identifiers with holes on both sides, but I deemed it too annoying to work with.
We should at least make this clear in the documentation of the mixfixExpression
function. It should be possible to make it work with multiple consecutive holes though. Feel free to have a go at this. Otherwise I'll have a go at it soon (but this week I'm terribly busy).
Sure, I can have a go at it. I was thinking of something like:
data Operator r e t ident expr where
-- ++_, if_then_else_, etc.
Prefix :: Vector (Prod r e t ident) n
-> (Vector expr n -> expr)
-> Operator r e t ident expr
-- _++, _[_], etc.
Postfix :: Vector (Prod r e t ident) n
-> (Vector expr n -> expr)
-> Operator r e t ident expr
-- _+_, _if_else_, etc
Infix :: Vector (Prod r e t ident) (S n)
-> (Vector expr (S (S n)) -> expr)
-> Associativity
-> Operator r e t ident expr
-- [[_]], (_), etc
Closed :: Vector (Prod r e t ident) (S n)
-> (Vector expr n -> expr)
-> Operator r e t ident expr
-- replaces mixfixExpression
buildExpressionGrammar
:: [[Operator r e t ident expr]]
-> Prod r e t ident
-> Grammar r e (Prod r e t expr)
Was this the same as your first version, which was too annoying to work with? For me, it doesn't seem that bad. Example:
import Data.Sized.Vector (Vector(..))
import qualified Data.Sized.Vector as V
data Expr
= Var String -- "5"
| Plus Expr Expr -- "5+5"
| Negate Expr -- "-5"
| If Expr Expr Expr -- "_if_else_"
buildExpressionParser
[ [ Infix (V.singleton (symbol "+") (\(e1 :- e2 :- Nil) -> Plus e1 e2) LeftAssoc ]
, [ Prefix (V.singleton (symbol "-") (\(e :- Nil) -> Negate e) ]
, [ Infix (symbol "if" :- symbol "else" :- Nil) (\(e1 :- e2 :- e3 :- Nil) -> If e1 e2 e2) RightAssoc ]
]
varExpression
As usual apologies if any of this doesn't typecheck as I just wrote it from scratch :P
The way I did it was just using normal ADTs, but similar in spirit. If we can get it working for consecutive holes much of that machinery would however be unnecessary. I would prefer that. :)
Ok, so there are sort of two issues I'm seeing. One, consecutive holes seem to not work properly with the existing mixfixExpression
. And two, the function is not very type safe (the user has no assurance that the Holey ident -> [expr] -> expr
corresponds to any of the operators defined).
Just to clarify, are you saying that you want to rectify the first problem, but would rather not pursue some more radical rethinking of the types, in favor of simplicity and ease of use?
Yeah, that's right. The first problem is more important IMO.
I think what's desirable for the second one depends on the usecase. If you statically know the identifiers like in your example, then the extra assurance and flexibility in choosing semantic actions separately for each identifier is nice. If you're generating a parser from a user-given table, then the current solution is probably easier to use since you often want the same semantic action for everything anyway, and dependent types would only get in the way. I'm not sure what the best course of action is.
Ok, sounds good. So, here's another issue which might be related to implementing consecutive holes:
What does it mean for an operator without holes on both sides to have an associativity? For example, even just simplifying a consecutive-hole operator like if _ _
to just ~ _
, wouldn't you always want to parse ~ ~ x
as ~ (~ x)
, and therefore prefix/postfix operators always have "right"/"left" associativity, respectively?
Yeah, that's probably the right idea. You could permit e.g. ~_
to be left-associative but that would mean that ~ ~ x
doesn't parse. I'm not sure if that's useful behaviour.
Ok, and then does that idea extend to mixfix operators that only have a hole on one side? i.e. if_then_else_
is essentially a prefix operator that should be right associative, and if it was given left associativity, if a then b else if c then d else e
shouldn't parse? (I haven't double checked what the current behavior of mixfixExpression
is)
I think that is the current behaviour, and it also extends to e.g. if_then_else_
. What's tricky to think of is associativity for operators that are non-alternating: What does it mean for if__
to be left- or right-associative? Here you could give two associativities meaning.
That's right, and I think it might make some sense to treat inner holes the same as if they were surrounded by tokens: they're just top-level expressions. So, if _ _
should always be right-associative, like other prefix expressions, and the first _
is any expression whatsoever.
I believe this is fixed now. Please reopen the issue if it doesn't work. :)
Just curious what you think about this behavior:
When defining an operator
if _ _
with right associativity, thenwill be successfully parsed as
but
will fail to parse, requiring explicit parentheses.
Doesn't this seem backwards? I would expect the above behavior if the operator had left associativity.
So, does associativity even make sense for operators that are not of the form
_a_b_c_d_
? Are twoNothing
s in a row when defining a mixfix operator supposed to be supported?