turion / rhine

Haskell Functional Reactive Programming framework with type-level clocks
http://hackage.haskell.org/package/rhine
123 stars 21 forks source link

Space leak in rhine-bayes example program? #227

Open turion opened 1 year ago

turion commented 1 year ago

Summary

The example application in rhine-bayes gets very slow quickly when increasing the number of particles in the filter. It seems that this is caused by a space leak. This leak might be in the custom runPopulationS function, or (even worse) in rhine or dunai.

Reproduce

  1. Apply the following diff:

    diff --git a/rhine-bayes/app/Main.hs b/rhine-bayes/app/Main.hs
    index 480b031..170ad98 100644
    --- a/rhine-bayes/app/Main.hs
    +++ b/rhine-bayes/app/Main.hs
    @@ -151,7 +151,7 @@ data Result = Result
    
    -- | The number of particles used in the filter. Change according to available computing power.
    nParticles :: Int
    -nParticles = 100
    +nParticles = 400
    
    -- * Visualization
  2. cabal run rhine-bayes-gloss --enable-profiling -- +RTS -hm (feel free to change the RTS flags for other methods of profiling)
  3. When the prompt in the program appears, press 1 or 2 and Enter. A window should slowly appear, with some animations. These are fairly slow.
  4. After a few frames of animation, close the window.
  5. hp2ps -d -c rhine-bayes-gloss.hp
  6. Inspect the resulting memory profile: image

For comparison, this is for 800 particles, suggesting linear complexity in the number of particles:

image

Preliminary analysis

It seems that for every inference step, a huge amount of thunks builds up only to be broken down immediately again. In this example, we had 400 particles and a peak memory usage of > 12 Mb, which means about 30 kb for a single particle. Even in the later parts when it's still > 4 MB of memory usage amplitude, it's 10 kb for a particle. The actual information represented by a particle is a couple of Doubles and should be definitely way below 1 kb, even with the overhead of having a separate MSF for every particle, I believe.

Also, given that the memory usage rises and falls so steeply is typical of space leaks.

I've split up the memory usage by module (option -hm), which shows that several modules from dunai (such as Data.MonadicStreamFunction.Core are involved. This lead me to believe that maybe there is a space-leaking construction from dunai involved?

But it's also possible that I'm reading this incorrectly and the runPopulationS construction is wrong. Running with -hc instead of -hm gives no clear hint, at least to me:

image

Space leak in arrM?

I applied the following diff to dunai:

diff --git a/dunai/src/Data/MonadicStreamFunction/Core.hs b/dunai/src/Data/MonadicStreamFunction/Core.hs
index 52117c3..f3f6e93 100644
--- a/dunai/src/Data/MonadicStreamFunction/Core.hs
+++ b/dunai/src/Data/MonadicStreamFunction/Core.hs
@@ -79,7 +79,7 @@ import Control.Applicative (Applicative(..))
 #endif

 -- Internal imports
-import Data.MonadicStreamFunction.InternalCore (MSF, embed, feedback, morphGS,
+import Data.MonadicStreamFunction.InternalCore (MSF (..), embed, feedback, morphGS,
                                                 reactimate)

 -- * Definitions
@@ -122,12 +122,12 @@ constM = arrM . const
 arrM :: Monad m => (a -> m b) -> MSF m a b
 arrM f =
   -- This implementation is equivalent to:
-  -- arrM f = go
-  --   where
-  --     go = MSF $ \a -> do
-  --            b <- f a
-  --            return (b, go)
-  morphGS (\i a -> i a >>= \(_, c) -> f a >>= \b -> return (b, c)) C.id
+  go
+    where
+      go = MSF $ \a -> do
+             b <- f a
+             return (b, go)
+  -- morphGS (\i a -> i a >>= \(_, c) -> f a >>= \b -> return (b, c)) C.id

In words, I replaced the implementation for arrM with a simpler one that uses the MSF constructor directly, but is very probably correct and not leaky.

I pointed my rhine-bayes tree to that patched dunai and ran it again with 400 particles:

image image

It seems like one big contributor of the leak(s?) is gone. I tried replacing other usages of morphGS with explicit constructor usage, but that didn't change much. And anyways it points at different modules as the main space leaks now.

Further analysis

@ivanperez-keera it seems that in this space leak, dunai is involved. I didn't open a dunai ticket because I don't have a clear reproducer there. But I believe that at least arrM is a culprit. Do you have other ideas how one might investigate this further?

turion commented 1 year ago

One way I'm investigating this now is by inlining all functions that allocate a lot of memory. There is some considerable speedup, but I still have 6 MB peak usage.

turion commented 1 year ago

Indeed, after I have inlined all functions in dunai that use the MSF constructor, and made MSF a newtype, I can reduce the peak memory usage to 3 MB, with a considerable speedup. This is at the cost of compilation times, which are now higher.

ivanperez-keera commented 1 year ago

I'm really busy at the moment and won't have time to look into this issue with the level of attention and care it deserves, but I'll be very interested to hear what you find. Benchmarks are in the near future of Dunai, so this will definitely be useful info.

turion commented 1 year ago

It seems that if I strictify and inline some rhine library functions, I can improve the runtime, but the spaceleak still remains. After also inlining several functions in Main.hs, performance improves. I can then run 800 particles in 3 MB in reasonable speed. Still, nearly half of the time is spent in garbage collection (says cabal run rhine-bayes-gloss --enable-profiling -- +RTS -sstderr), so the story is not over. Also, it's not viable to tell library users to go and inline all their own functions. Ideally, the library would have the correct fusion properties such that they don't need to do this. Probably, so much inlining is necessary for some kind of optimization to kick in, and it's still unclear what this optimization is.

turion commented 4 months ago

I'll have to revisit this after https://github.com/turion/rhine/pull/299.