serokell / universum

:milky_way: Prelude written in @Serokell
MIT License
176 stars 28 forks source link

`concatMapM` to `foldMapA` and `foldMapM` #171

Open effectfully opened 6 years ago

effectfully commented 6 years ago

concatMapM has multiple problems. For one, it has a rather confusing name for a function defined like this:

concatMapM
    :: ( Applicative f
       , Monoid m
       , Container (l m)
       , Element (l m) ~ m
       , Traversable l
       )
    => (a -> f m) -> l a -> f m
concatMapM f = fmap fold . traverse f

There is nothing related to M here. Moreover, the user might think concatMapM can be used with a strict monoid like Sum, but that would result in a space leak, because an entire container is traversed first by traverse and only then flattened by fold. And the type signature is more restrictive than it should be: there is no need for Traversable -- Foldable is enough as it's just applicative foldMap.

So it should be defined something like this (modulo that Container thing):

foldMapA :: (Applicative f, Monoid m, Foldable t) => (a -> f m) -> t a -> f m
foldMapA f = foldr (\a fm -> mappend <$> f a <*> fm) (pure mempty)

But there indeed exists a notion of monadic folding which is useful for folding into a strict monoid:

foldMapM :: (Foldable f, Monad m, Monoid b) => (a -> m b) -> f a -> m b
foldMapM f xs = foldr step return xs mempty where
  step x r z = f x >>= \y -> r $! z `mappend` y

You can define all those andM, orM, allM, anyM in terms of foldMapM just like and, or, all, any are defined in terms of foldMap.

chshersh commented 6 years ago

@effectfully This looks like and interesting idea. I would like to see benchmarks and profiling first just to be :100: sure that this implementation is faster or at least not slower (and doesn't contain space leaks). Type of concatMapM always was confusing to me as well, it's just a result of generalization. But I'm okay with making things simpler. Personally I just want to be able to call multiple actions on some collection, where which action returns some monoid and monoid concatenation is performed automatically. Having examples like this to work is also a good feature:

concatMapM readFile files >>= putTextLn
effectfully commented 6 years ago

I would like to see benchmarks and profiling first just to be 100 sure that this implementation is faster or at least not slower

That's a lot of things to do just to prove an obvious fact that fmap fold . traverse f is way too inefficient when run in a strict Monoid (compared to foldMapM) and is too restrictive (compared to foldMapA).

(and doesn't contain space leaks)

foldMapA eats stack of course. I checked foldMapM some time ago and it worked alright, but yes, a proper test is needed here. If I was to write such a test, I'd use Control.Monad.Writer.CPS and weigh, but right now I have many other things to do.

dcastro commented 2 years ago

I had a look at foldMapM to see how evaluation would look like.

If we expand the definition a bit, and add a type sig to step for clarity:

foldMapM 
  :: forall c a m b. (Container c, Element c ~ a, Monad m, Monoid b)
  => (a -> m b) -> c -> m b
foldMapM f xs = foldr step return xs mempty
  where
    step :: a -> (b -> m b) -> (b -> m b)
    step a b_mb = \b -> do
      b2 <- f a
      let !res = b <> b2
      b_mb res

Let's specialize to lists, where foldr is defined as:

foldr :: (a -> b -> b) -> b -> [a] -> b
foldr f acc []     = acc
foldr f acc (x:xs) = f x (foldr f acc xs)

And now try to evaluate this expression:

a :: IO (Sum Integer)
a = foldMapM (\x -> pure (Sum x)) [1, 2]
foldMapM (\x -> pure (Sum x)) [1, 2]

-- substitute `foldMapM`
foldr step return [1,2] mempty
  where
    step :: Int -> (Sum Int -> IO (Sum Int)) -> (Sum Int -> IO (Sum Int))
    step a b_mb = \b -> do
      b2 <- pure $ Sum a
      let !res = b <> b2
      b_mb res

-- substitute `foldr` with its 2nd equation
step 1 (foldr step return [2]) mempty

-- substitute `step`
do
  b2 <- pure $ Sum 1
  let !res = mempty <> b2
  foldr step return [2] res

-- evaluate the first 2 statements
foldr step return [2] (Sum 1)

-- substitute `foldr` with its 2nd equation
step 2 (foldr step return []) (Sum 1)

-- substitute `step`
do
  b2 <- pure $ Sum 2
  let !res = (Sum 1) <> b2
  foldr step return [] res

-- evaluate the first 2 statements
foldr step return [] (Sum 3)

-- substitute `foldr` with its 1st equation
return (Sum 3)

In conclusion: constant stack, doesn't create thunks, so no space leaks. It seems to have similar properties to foldl'.

s-and-witch commented 2 years ago

Let's see foldMapA behavior on strict and lasy monoids.

The first example will be:

foo :: (String, (Sum Int))
foo = foldMapA (\x -> pure (Sum x)) [1, 2, 3] 

-- expand our definition, replacing foldMapA with it's body
foo = foldr (\a fm -> mappend <$> ((\x -> pure (Sum x)) a) <*> fm) (pure mempty) [1, 2, 3] 

-- reduce (String,) applicative and Monoid functions
foo = foldr (\a fm -> (\(s1, x) (s2, y) -> (s1 <> s2, x + y)) ((\x -> ([], x)) a) fm) ([], 0) [1, 2, 3] 

-- simplify lambdas
foo = foldr (\a (s2, y) -> let (s1, x) = ((\x -> ([], x)) a) in (s1 <> s2, x + y)) ([], 0) [1, 2, 3] 

foo = foldr (\a (s2, y) -> ([] <> s2, a + y)) ([], 0) [1, 2, 3] 

-- move lamba into `step` function
foo = foldr step ([], 0) [1, 2] 
  where
    step a (s2, y) = ([] <> s2, a + y)

Now we can clearly see, that our step function strict on it's second argument and computation will be:

> step 1 (foldr step ([], 0) [2, 3])
> step 1 (step 2 (foldr step ([], 0) [3]))
> step 1 (step 2 (step 3 (foldr step ([], 0) [])))
> step 1 (step 2 (step 3 ([], 0)))
> step 1 (step 2 ([], 3))
> step 1 ([], 5)
> ([], 6)

What about lazy applicative, such as (-> e)?

foo :: Int -> (Sum Int)
foo = foldMapA (\x -> pure (Sum x)) [1, 2, 3] 

-- definition from the first attempt
foo = foldr (\a fm -> mappend <$> ((\x -> pure (Sum x)) a) <*> fm) (pure mempty) [1, 2, 3]

-- inline Applicative definitions
foo = foldr (\a fm -> \e -> ((\x -> const x) a e) + fm e) (const 0) [1, 2, 3]

-- simplify
foo = foldr (\a fm -> \e -> ((const a) e) + fm e) (const 0) [1, 2, 3]

-- introduce `step`
foo = foldr step (const 0) [1, 2, 3]
  where
    step a fm = \e -> ((const a) e) + fm e

Now step is'n strict on Applicative, but sum yet not tail recursive

step 1 (foldr step (const 0) [2, 3])
\e -> const 1 e + (foldr step (const 0) [2, 3]) e

-- now add argument
(\e -> const 1 e + (foldr step (const 0) [2, 3]) e) 4
const 1 4 + (foldr step (const 0) [2, 3]) 4
1 + (step 2 (foldr step (const 0) [3])) 4
1 + (\e -> const 2 e + (foldr step (const 0) [3]) e) 4
1 + (const 2 4 + (foldr step (const 0) [3]) 4) 
1 + (2 + (foldr step (const 0) [3]) 4) 
1 + (2 + (step 3 (foldr step (const 0) [])) 4) 
1 + (2 + (\e -> const 3 e + (foldr step (const 0) [] e)) 4) 
1 + (2 + (const 3 4 + (foldr step (const 0) [] 4))) 
1 + (2 + (3 + (foldr step (const 0) [] 4))) 
1 + (2 + (3 + (const 0 4))) 
1 + (2 + (3 + 0)) 
1 + (2 + 3) 
1 + 5
6 

There is no reason why strict Applicative can start work with lazy monoid, so let's move to both lazy Applicative and Monoid

foo :: Int -> [Int]
foo = foldMapA (\x -> pure (Sum x)) [1, 2, 3] 

foo = foldr (\a fm -> \e -> ((\x -> const [x]) a e) <> fm e) (const []) [1, 2, 3]

foo = foldr (\a fm -> \e -> ((const [a]) e) <> fm e) (const []) [1, 2, 3]

foo = foldr step (const []) [1, 2, 3]
  where
    step a fm = \e -> ((const [a]) e) <> fm e

Seems good to me:

step 1 (foldr step (const []) [2, 3])
\e -> const [1] e <> (foldr step (const []) [2, 3]) e

-- now add argument
(\e -> const [1] e <> (foldr step (const []) [2, 3]) e) 4
const [1] 4 <> (foldr step (const []) [2, 3]) 4
[1] <> (foldr step (const []) [2, 3]) 4

-- now guarded recursion works, next steps will not compute untill case matching/seq function
1 : (foldr step (const []) [2, 3]) 4

-- but let's immagine, that we `force` list
1 : step 2 (foldr step (const []) [3]) 4
1 : (\e -> const [2] e <> (foldr step (const []) [3]) e) 4
1 : const [2] 4 <> (foldr step (const []) [3]) 4
1 : [2] <> (foldr step (const []) [3]) 4
1 : 2 : (foldr step (const []) [3]) 4
1 : 2 : step 3 (foldr step (const []) []) 4
1 : 2 : (\e -> const [3] e <> (foldr step (const []) []) e) 4
1 : 2 : const [3] 4 <> (foldr step (const []) []) 4
1 : 2 : [3] <> (foldr step (const []) []) 4
1 : 2 : 3 : (foldr step (const []) []) 4
1 : 2 : 3 : const [] 4
1 : 2 : 3 : []

And some tests in GHCi:

-- ([Int], Sum Int)
ghci> foldr (\a fs -> mappend <$> (pure $ Sum a) <*> fs) ([], mempty) (replicate 100000000 1)
*** Exception: stack overflow

-- Int -> Sum Int
ghci> foldr (\a fs -> mappend <$> (const $ Sum a) <*> fs) (const mempty) (replicate 100000000 1) 4
Sum {getSum = *** Exception: stack overflow

-- ([Int], [Int])
ghci> foldr (\a fs -> mappend <$> (pure $ [a]) <*> fs) ([], mempty) (replicate 100000000 1)
*** Exception: stack overflow

-- Int -> [Int]
ghci> foldr (\a fs -> mappend <$> (const $ [a]) <*> fs) (const mempty) (replicate 100000000 1) 4
[1,1,1,1,1,1,1...