michaelt / streaming

An optimized general monad transformer for streaming applications, with a simple prelude of functions
BSD 3-Clause "New" or "Revised" License
104 stars 11 forks source link

How to fight undesirable sharing? #6

Closed UnkindPartition closed 8 years ago

UnkindPartition commented 8 years ago

This simple code leaks memory because the stream gets shared:

import qualified Streaming as S
import qualified Streaming.Prelude as S

n = 5000000

main = a >> a

a = S.mapM_ print $ S.take n $ S.enumFrom (0::Int)

The only workaround I have found is to introduce dummy monadic layers:

import qualified Streaming as S
import qualified Streaming.Prelude as S

n = 5000000

main = a >> a

a = S.mapM_ print $ S.take n $ myenumFrom (0::Int)

myenumFrom n = S.effect . return $ do
  S.yield n
  myenumFrom (n+1)

Are there any better approaches?

michaelt commented 8 years ago

It should be possible to reproduce any sharing conundrum that you could make with Data.List. Thus all of these exhaust memory except the one that uses streaming copy:

import qualified Streaming as S
import qualified Streaming.Prelude as S
import Pipes
import qualified Pipes.Prelude as P
import qualified Conduit as C
import qualified Data.Conduit.List as C
import Control.Monad
import Control.Concurrent (threadDelay)

x = 50000000 :: Int

main = stream'

rep :: Monad f => (Int -> f a) -> f ()
rep act = loop 1 where
  loop n = do
    when (n < x) $ do
      act n
      loop (n+1)

ns :: [Int]
ns = [1..x]
ss :: Monad m => S.Stream (S.Of Int) m ()
ss = rep S.yield
ps :: Monad m => Producer Int m ()
ps = rep yield
cs :: Monad m => C.ConduitM i Int m ()
cs = rep C.yield

list = do
  print $ sum ns
  threadDelay 1000000
  print $ product ns

conduit = do
  n <- cs C.$$ C.sumC
  print n 
  threadDelay 1000000
  n' <- cs C.$$ C.productC
  print n'

pipes = do
  n <- P.sum ps
  print n 
  threadDelay 1000000
  n' <- P.product ps 
  print n'    

stream = do
  n <- S.sum ss
  print n 
  threadDelay 1000000
  n' <- S.product ss
  print n'

stream' = do
  n <-  S.product $ S.sum $ S.copy ss
  threadDelay 1000000
  print n 
UnkindPartition commented 8 years ago

I don't follow.

You're showing a different example -- one where you use a stream twice, for two different computations.

In my example, I use the stream once to produce an IO action. That IO action is later executed twice.

I am not sure how your copy solution applies to my case.

michaelt commented 8 years ago

Sorry, the bit about copy was distracting, then; the comparison with Data.List and the streaming libraries was the point. In

 main = do
    let io_action = mapM_ print $ take n $ [0..]
    io_action 
    io_action 

ghc thinks it might need the list [0..] again. There are various ways of keeping it from getting this impression. My thought was that nothing will protect you from sharing curiosities as long as you are using ghc/haskell which must hunt for sharing opportunities. Do you think that something is wrong with conduit, because it produces the difficulty above? It is using a codensity representation, but ghc still thinks it has to preserve its calculation.

Because the representation of S.enumFrom 0, even when specialized to IO, is very close to that of [0..] (in that it doesn't itself involve any monadic steps) -- it differs only by throwing in two constructors for each number emitted, where the list has only : -- ghc will be especially prone to thinking it can re-use the results it is calculating. In the case in question you were able to break this by inserting monadic steps, but I don't think it is possible generally, as we see from the conduit/pipes/list cases which as you say "use the stream once to define an io action"

UnkindPartition commented 8 years ago

Yes, I understand why this is happening, and I didn't want to imply that the problem is somehow specific to this library.

I was just curious if you knew any better workarounds than the one I came up with.

I also have another way to fix this minimal example:

{-# OPTIONS_GHC -fno-full-laziness #-}
import qualified Streaming as S
import qualified Streaming.Prelude as S

a :: () -> IO ()
a () = S.mapM_ print $ S.take 500000 $ S.enumFrom 0

main :: IO ()
main = a () >> a ()

However, in my real code base, the point where S.mapM_ is called is deep down the call stack from where the action is duplicated, and transforming all actions in between to take a dummy argument is not very practical.

UnkindPartition commented 8 years ago

@nomeata I know you've done a ton of work in this area. Do you have any thoughts? Is there a chance to get your dup function into ghc?

michaelt commented 8 years ago

For a while, I implemented the three principal offenders, each yield and enumFrom more or less as you did myenumfrom, interpolating a monadic layer, but this of course slows things down a bit; then I thought of adding a yieldM eachM etc., defined like myenumfrom, but the whole device only gets rid of a range of cases. So in the end I decided that the most raw representation would make it easier to reason and anticipate them: something like as = S.each [0..] really is very close to as = [0..], just more corpulent. A device like delaying with a () argument works in some cases, not in others. Insertion of monadic layers is like this. Pipes is putting all but the first element of the stream behind successive () -> ... but still generates the pathology above. The conduit case is even more remarkable.

The slowdown (which is just a matter of piling on more constructors) was only visible in trivial benchmarky cases, so maybe it's worth it.

nomeata commented 8 years ago

Is there a chance to get your dup function into ghc?

Unlikely; it was very fiddly and low-level, and probably is not reliable either.

Unfortunately, I don’t know of any better solution. Even tricks like an () argument can possibly fail to work as expected if GHC is too smart. And all the other tricks (monadic actions) just work because they obscure the code to the compiler. If the compiler gets smarter, it will fail again.

We might need pragmas like inline this or fuse this really hard to guide the compiler, but again, it is not clear how that would look like precisely.

michaelt commented 8 years ago

Working through some of the recalcitrant cases, I did find that -fno-full-laziness seems to work wonders except where there is a clear binding to a 'pure' stream/pipe/conduit. For example, it much more reliably falls for the () argument trick, if I am understanding my evidence properly - but also if the sameness of stream is otherwise buried. So one prophylactic might be to recommend use of it when compiling main. Strangely, feuerbach's original program

import qualified Streaming as S
import qualified Streaming.Prelude as S

n = 5000000
main = a >> a
a = S.mapM_ print $ S.take n $ S.enumFrom (0::Int)

is not finding any unintended sharing. But I'm sure I observed it yesterday. This suggests to me that the compiler has a learning ability, which wouldn't too much surprise me.

tomjaguarpaw commented 8 years ago

Even tricks like an () argument can possibly fail to work as expected if GHC is too smart.

I would be very interested in any cases you find where no-full-laziness plus () argument fails. Please forward any such cases you find to me!

UnkindPartition commented 8 years ago

@tomjaguarpaw see my original snippet in the very first post

tomjaguarpaw commented 8 years ago

@feuerbach The original snippet doesn't have () argument ... ?

tomjaguarpaw commented 8 years ago

I think data OfNoSharing a b = OfNoSharing a (() -> b) could be useful for preventing sharing.

UnkindPartition commented 8 years ago

@tomjaguarpaw it doesn't, but simply adding a () argument to a didn't help last I checked.

Your OfNoSharing could indeed work (I haven't checked), but it's rather different from the standard mechanical transformation of adding a () argument to the thing you don't want to share, which I thought you were referring to.

tomjaguarpaw commented 8 years ago

simply adding a () argument to a didn't help last I checked

Right, which is why I am interested in

any cases you find where no-full-laziness plus () argument fails.

i.e. do you know of any space leaks where both

  1. no-full-laziness is enabled, and
  2. anything you don't want shared is hidden behind a lambda abstraction

If so, as I said, I would be very interested. It is my current hypothesis that if both conditions are satisfied then the thunk will not be shared.

tomjaguarpaw commented 8 years ago

it's rather different from the standard mechanical transformation of adding a () argument to the thing you don't want to share, which I thought you were referring to.

They are indeed different, and I'm suggesting that each would work.

UnkindPartition commented 8 years ago

Weird, I can't reproduce it anymore. I'm reasonably sure no-full-laziness and dummy arguments didn't help in my real problem at work. I'll try to make a minimal example when I have time.

tomjaguarpaw commented 8 years ago

That's what they all* say :)

michaelt commented 8 years ago

@tomjaguarpaw that's an interesting discussion on haskell-cafe. I wondered about something like data OfNoSharing a b = OfNoSharing a (() -> b) That is effectively what Pipes is using when specialized to Producer. I labored quite a bit convincing myself that a left-strict pair was better than a lazy one, but never tested a more-than-just-right-lazy pair. I think the existing scheme explains the cases where this library is faster than pipes, which is a lot of the easily comparable cases, so the results might be unpleasant. Another possibility would be to change the (hidden) constructors to

  data Stream f m r = Return r | Effect (m (Stream f m r)) | Step (() -> f (Stream f m r)) 

That might be somewhat more practical. Then it could just be another hidden feature the user wouldn't bump into much, since one could still have essential functions like wrap :: f (Stream f m r) -> Stream f m r and so on. I made the existing Step (!(f (Stream f m r))) constructor strict, because it was seeming to win in lots of cases, but I think there was never any considerable difference.

The no-sharing pair would also be kind of a drag since, at least the way I was thinking, it is important for comprehension that the folds in e.g. Streaming.Prelude should be of the form

 sum :: Stream (Of Int) m r -> m (Of Int r) 

and the like, so that they can be used for the characteristic rank-2 maps, but they also need to be practical taken by themselves. Construction and deconstruction would be less pleasant. But I don't know. All of this presupposes that generally counseling fno-full-laziness makes sense, though I suppose it would eliminate some mishaps anyway.

tomjaguarpaw commented 8 years ago

I'm on a bit of a crusade to eliminate full-laziness, or at least replace it with something more subtle: https://stackoverflow.com/questions/35115172/why-is-no-full-laziness-a-default-optimization/35115664#35115664

I would be interested to see how performance changes under OfNoSharing a (() -> b) if and when you get results.

I don't understand why the form of sum would change. Can you explain?

michaelt commented 8 years ago

The type signature would be the same, it's just that that that's a place where the user might actually pattern match, say

loop str = do 
  (n:> rest) <- lift $ S.sum $ S.splitAt 3 str
  when (n < (10::Int)) $ do
    S.yield n
    loop rest

which then becomes

  ...
  loop (rest ())

which isn't bad, just more to keep track of.

winterland1989 commented 8 years ago

Will the CONLIKE pragma help here?

michaelt commented 8 years ago

You mean marking something like enumFrom with {-# INLINE CONLIKE enumFrom #-} or {-# INLINE CONLIKE[1] enumFrom #-} or whatever, in its source, I take it? It doesn't affect the test we were contemplating, though maybe I don't know enough about how to use it. I wonder how it would affect optimization in the intuitively correct case where I don't unreasonably bind a name to the result; optimization is the main thing I am worried about.

Here's the source I'm using - toggling between stream, list, etc. - minus the distraction that caused trouble above.

module Main (main) where

import qualified Streaming.Prelude as S
import Streaming (Of(..))
import Pipes
import qualified Pipes.Prelude as P
import qualified Conduit as C
import qualified Data.Conduit.List as C
import Control.Monad
import Control.Concurrent (threadDelay)

x = 50000000 :: Int

main = stream

rep :: Monad f => (Int -> f a) -> f ()
rep act = loop 1 where
  loop n = do
    when (n < x) $ do
      act n
      loop (n+1)

ns :: [Int]
ns = [1..x]

ss :: S.Stream (Of Int) ()
ss = rep S.yield

ps :: Monad m => Producer Int m ()
ps = rep yield
cs :: Monad m => C.ConduitM i Int m ()
cs = rep C.yield

list = do
  print $ sum ns
  threadDelay 1000000
  print $ product ns

conduit = do
  n <- cs C.$$ C.sumC
  print n 
  threadDelay 1000000
  n' <- cs C.$$ C.productC
  print n'

pipes = do
  n <- P.sum ps
  print n 
  threadDelay 1000000
  n' <- P.product ps 
  print n'    

stream = do
  n <- S.sum ss
  print n 
  threadDelay 1000000
  n' <- S.fold (+) 0 id ss
  print n'
tomjaguarpaw commented 3 years ago

EDIT: migrated to https://github.com/haskell-streaming/streaming/issues/110#issue-1035177418

treeowl commented 3 years ago

@tomjaguarpaw, michaelt is no longer active on GitHub or the Haskell community, unfortunately. Development of this package has moved to https://github.com/haskell-streaming/streaming. If you'd like to reopen discussion, I suggest you do so in a new issue there, and link to this one.

tomjaguarpaw commented 3 years ago

Thanks @treeowl. Done!