mrkkrp / parser-combinators

Lightweight package providing commonly useful parser combinators
Other
52 stars 15 forks source link

many and some for permutation parsers #39

Open expipiplus1 opened 2 years ago

expipiplus1 commented 2 years ago

Would be nice to have something like this. @recursion-ninja do you think this makes sense?

manyPerm :: Functor m => m a -> Permutation m [a]
manyPerm parser = go []
 where
  go acc = P (Just (reverse acc)) (go . (: acc) <$> parser)

somePerm :: Functor m => m a -> Permutation m [a]
somePerm parser = P Nothing (go . pure <$> parser)
 where
  go acc = P (Just (reverse acc)) (go . (: acc) <$> parser)
recursion-ninja commented 2 years ago

I have to think about this. It would certainly be useful, but what are the semantics?

Consider the permutation parser:

(,,) <$> char 'a' <*> optional (char 'b') <*> somePerm (char 'c')

Would it match:

expipiplus1 commented 2 years ago

Yeah, that's definitely what I'd expect, in fact I'm finding it hard to think of any other semantics which wouldn't be better done some other way.

A nice use case for this is to do something like:

Module sigs decs dataDecs <- runPermutation $ Module <$> manyPerm sig <*> manyPerm dec <*> manyPerm dataDec

Here, sorting a module of mashed up signatures, declarations and data declarations into lists of appropriate type. Saving having to go via a [(Sig | Dec | DataDec)]

recursion-ninja commented 2 years ago

@expipiplus1 I want to let you know that I have not forgotten your feature request. Unfortunately, it has been stalled by a combination of my busy schedule and theoretical difficulties. I will report back within two months time with a suitable exposition of the parsing problem at hand.

expipiplus1 commented 2 years ago

Sure :) no problem at all

On Tue, 31 May 2022, 11:48 am recursion-ninja, @.***> wrote:

@expipiplus1 https://github.com/expipiplus1 I want to let you know that I have not forgotten you feature request. Unfortunately, it has been stalled by a combination of my busy schedule and theoretical difficulties. I will report back within two months time with a suitable exposition of the parsing problem at hand.

— Reply to this email directly, view it on GitHub https://github.com/mrkkrp/parser-combinators/issues/39#issuecomment-1141638494, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGRJXAQK2BHJLHR7PDRSWDVMWDYLANCNFSM5U3GPNOQ . You are receiving this because you were mentioned.Message ID: @.***>

expipiplus1 commented 1 year ago

@recursion-ninja, there's no time pressure on my part at all, but I stumbled across this and thought it was worth a ping :D

expipiplus1 commented 1 year ago

One annoyance with this is that it prevents sharing any prefix between the different elements. For example if one allows decorations to be attached to declarations in a parsed language, it would be nice to parse the decorations and then branch on the specific declaration type...

Adding support from that would definitely detract from the simplicity of this library though and I'm not advocating for it!

recursion-ninja commented 1 year ago

Here's a write up why this is generally not feasible with the current implementation:

https://recursion.ninja/blog/perm-parser

expipiplus1 commented 1 year ago

Ah, @recursion-ninja, I think there has been some big mix-up! This suggestion was solely for the Monadic interface, Control.Monad.Permutations, where it does seem to work quite nicely.

p :: Permutation (ParsecT Void String Identity) (Char, Maybe Char, [Char])
p =
  (,,)
    <$> toPermutation (char @_ @String 'a')
    <*> toPermutationWithDefault Nothing (Just <$> char @_ @String 'b')
    <*> somePerm (char @_ @String 'c')

g
  :: String
  -> Either
      (ParseErrorBundle String Void)
      (Char, Maybe Char, [Char])
g = Text.Megaparsec.runParser (runPermutation p) ""
λ> g "abc"
Right ('a',Just 'b',"c")
λ> g "cac"
Right ('a',Nothing,"cc")
λ> g "cacb"
Right ('a',Just 'b',"cc")
λ> g "cabc"
Right ('a',Just 'b',"cc")
krzygorz commented 1 year ago

Recently I got nerd-sniped into writing my own version of Permutation, and along the way I've accumulated some thoughts on this. I stumbled upon this issue and thought it would be a good time to write down my ideas.

Firstly, I've read @recursion-ninja's writeup, but I'm sceptical about the conclusion. The last picture of the post does not show an important detail: the parsing process can stop at any node in which the non-optional components have been found. So, in the example from the post, as soon as we have an 'a' and a 'c', there is no need to endlessly follow the branches (but if there is more matching input, we can follow it and stop once we consumed everything that we could).

The monadic case

@expipiplus1's implementation seems to work correctly. Here's my understanding of how manyPerm works:

If manyPerm occurs as a part of some larger expression, otherExpr <*> (manyPerm ...), the manyPerm's continuation will be modified by the <*> operator to interleave it with otherExpr.

So the interleaving of various components should be handled well. Indeed, applying the post's example parser to the string "ccbacbcb" gives me the correct result: ('a',"bbb","cccc") and consumes the whole input.

The applicative case

I've also tried to implement manyPerm for Control.Applicative.Permutations:

manyPerm :: Alternative m => m a -> Permutation m [a]
manyPerm parser = P (Just []) [Branch (flip (:) <$> manyPerm parser) parser]

To get a better idea of how this works, we can also expand the definitions a few times:

manyPerm :: Alternative m => m a -> Permutation m [a]
manyPerm parser = p0
  where
    p0 = P (Just []) [Branch p1 parser]
    p1 = P (Just (\x -> [x])) [Branch p2 parser]
    p2 = P (Just (\x1 x2 -> [x2, x1])) [Branch p3 parser]
    p3 = P (Just (\x1 x2 x3 -> [x3,x2,x1])) [Branch (((\ys x1 x2 x3 -> x3:x2:x1:ys) .) <$> p1) parser]

The principle is simple:

Graphically, it could be presented like this:

 |
 +-----> [ ]  
 |
 |x1
 |
 +-----> [x1]
 |
 |x2
 |
 +-----> [x1,x2]
 |
 |
...

Again, the tree is infinite, but that's not a problem because we can stop at any node. And because the process is broken down into a sequence of Branches, it can be interleaved with other components.

An Alternative instance for Permutation?

If Permutation had a nicely behaved Alternative instance, we wouldn't even need the special manyPerm function, and we could just use normal many:

Module sigs decs dataDec <- runPermutation $
  Module <$> many (toPermutation sig) <*> many (toPermutation dec) <*> many (toPermutation dataDec)

But it seems that it would be tricky, or even impossible, to make this behave well for combinators that use <|> recursively (like many). And I'm not even 100% sure if one could define sensible semantics for <|>.

... and that concludes this rather long braindump. Hopefully, at least some of this makes sense, but I haven't tried to formally prove any properties of these functions.