Open expipiplus1 opened 2 years ago
I have to think about this. It would certainly be useful, but what are the semantics?
Consider the permutation parser:
(,,) <$> char 'a' <*> optional (char 'b') <*> somePerm (char 'c')
Would it match:
"abc"
→ ('a', Just 'b', "c")
"cac"
→ ('a', Nothing, "cc")
"cacb"
→ ('a', Just 'b', "cc")
"cabc"
→ ('a', Just 'b', "cc")
Yeah, that's definitely what I'd expect, in fact I'm finding it hard to think of any other semantics which wouldn't be better done some other way.
A nice use case for this is to do something like:
Module sigs decs dataDecs <- runPermutation $ Module <$> manyPerm sig <*> manyPerm dec <*> manyPerm dataDec
Here, sorting a module of mashed up signatures, declarations and data declarations into lists of appropriate type. Saving having to go via a [(Sig | Dec | DataDec)]
@expipiplus1 I want to let you know that I have not forgotten your feature request. Unfortunately, it has been stalled by a combination of my busy schedule and theoretical difficulties. I will report back within two months time with a suitable exposition of the parsing problem at hand.
Sure :) no problem at all
On Tue, 31 May 2022, 11:48 am recursion-ninja, @.***> wrote:
@expipiplus1 https://github.com/expipiplus1 I want to let you know that I have not forgotten you feature request. Unfortunately, it has been stalled by a combination of my busy schedule and theoretical difficulties. I will report back within two months time with a suitable exposition of the parsing problem at hand.
— Reply to this email directly, view it on GitHub https://github.com/mrkkrp/parser-combinators/issues/39#issuecomment-1141638494, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGRJXAQK2BHJLHR7PDRSWDVMWDYLANCNFSM5U3GPNOQ . You are receiving this because you were mentioned.Message ID: @.***>
@recursion-ninja, there's no time pressure on my part at all, but I stumbled across this and thought it was worth a ping :D
One annoyance with this is that it prevents sharing any prefix between the different elements. For example if one allows decorations to be attached to declarations in a parsed language, it would be nice to parse the decorations and then branch on the specific declaration type...
Adding support from that would definitely detract from the simplicity of this library though and I'm not advocating for it!
Here's a write up why this is generally not feasible with the current implementation:
Ah, @recursion-ninja, I think there has been some big mix-up! This suggestion was solely for the Monadic interface, Control.Monad.Permutations, where it does seem to work quite nicely.
p :: Permutation (ParsecT Void String Identity) (Char, Maybe Char, [Char])
p =
(,,)
<$> toPermutation (char @_ @String 'a')
<*> toPermutationWithDefault Nothing (Just <$> char @_ @String 'b')
<*> somePerm (char @_ @String 'c')
g
:: String
-> Either
(ParseErrorBundle String Void)
(Char, Maybe Char, [Char])
g = Text.Megaparsec.runParser (runPermutation p) ""
λ> g "abc"
Right ('a',Just 'b',"c")
λ> g "cac"
Right ('a',Nothing,"cc")
λ> g "cacb"
Right ('a',Just 'b',"cc")
λ> g "cabc"
Right ('a',Just 'b',"cc")
Recently I got nerd-sniped into writing my own version of Permutation
, and along the way I've accumulated some thoughts on this. I stumbled upon this issue and thought it would be a good time to write down my ideas.
Firstly, I've read @recursion-ninja's writeup, but I'm sceptical about the conclusion. The last picture of the post does not show an important detail: the parsing process can stop at any node in which the non-optional components have been found. So, in the example
from the post, as soon as we have an 'a'
and a 'c'
, there is no need to endlessly follow the branches (but if there is more matching input, we can follow it and stop once we consumed everything that we could).
@expipiplus1's implementation seems to work correctly. Here's my understanding of how manyPerm
works:
Permutation
has a default value of []
P
tries to get a single item. If it fails, runPermutation
will use the default value.Permutation
(wrapped in m
) that serves as a continuation of the process. This continuation is only resumed by runPermutation
. Permutation
repeats the process, with a new default value which has been extended by the previously found element.If manyPerm
occurs as a part of some larger expression, otherExpr <*> (manyPerm ...)
, the manyPerm
's continuation will be modified by the <*>
operator to interleave it with otherExpr
.
So the interleaving of various components should be handled well. Indeed, applying the post's example
parser to the string "ccbacbcb" gives me the correct result: ('a',"bbb","cccc")
and consumes the whole input.
I've also tried to implement manyPerm
for Control.Applicative.Permutations
:
manyPerm :: Alternative m => m a -> Permutation m [a]
manyPerm parser = P (Just []) [Branch (flip (:) <$> manyPerm parser) parser]
To get a better idea of how this works, we can also expand the definitions a few times:
manyPerm :: Alternative m => m a -> Permutation m [a]
manyPerm parser = p0
where
p0 = P (Just []) [Branch p1 parser]
p1 = P (Just (\x -> [x])) [Branch p2 parser]
p2 = P (Just (\x1 x2 -> [x2, x1])) [Branch p3 parser]
p3 = P (Just (\x1 x2 x3 -> [x3,x2,x1])) [Branch (((\ys x1 x2 x3 -> x3:x2:x1:ys) .) <$> p1) parser]
The principle is simple:
[]
Permutation
. The default value of this new permutation is the function \x1 -> [x1]
, which will be given the parsed item.\x1 x2 .. xn -> [xn .. x2 x1]
Graphically, it could be presented like this:
|
+-----> [ ]
|
|x1
|
+-----> [x1]
|
|x2
|
+-----> [x1,x2]
|
|
...
Again, the tree is infinite, but that's not a problem because we can stop at any node. And because the process is broken down into a sequence of Branch
es, it can be interleaved with other components.
Alternative
instance for Permutation
?If Permutation
had a nicely behaved Alternative
instance, we wouldn't even need the special manyPerm
function, and we could just use normal many
:
Module sigs decs dataDec <- runPermutation $
Module <$> many (toPermutation sig) <*> many (toPermutation dec) <*> many (toPermutation dataDec)
But it seems that it would be tricky, or even impossible, to make this behave well for combinators that use <|>
recursively (like many
). And I'm not even 100% sure if one could define sensible semantics for <|>
.
... and that concludes this rather long braindump. Hopefully, at least some of this makes sense, but I haven't tried to formally prove any properties of these functions.
Would be nice to have something like this. @recursion-ninja do you think this makes sense?