Open turion opened 1 year ago
One way I'm investigating this now is by inlining all functions that allocate a lot of memory. There is some considerable speedup, but I still have 6 MB peak usage.
Indeed, after I have inlined all functions in dunai
that use the MSF
constructor, and made MSF
a newtype
, I can reduce the peak memory usage to 3 MB, with a considerable speedup. This is at the cost of compilation times, which are now higher.
I'm really busy at the moment and won't have time to look into this issue with the level of attention and care it deserves, but I'll be very interested to hear what you find. Benchmarks are in the near future of Dunai, so this will definitely be useful info.
It seems that if I strictify and inline some rhine
library functions, I can improve the runtime, but the spaceleak still remains. After also inlining several functions in Main.hs
, performance improves. I can then run 800 particles in 3 MB in reasonable speed. Still, nearly half of the time is spent in garbage collection (says cabal run rhine-bayes-gloss --enable-profiling -- +RTS -sstderr
), so the story is not over. Also, it's not viable to tell library users to go and inline all their own functions. Ideally, the library would have the correct fusion properties such that they don't need to do this. Probably, so much inlining is necessary for some kind of optimization to kick in, and it's still unclear what this optimization is.
I'll have to revisit this after https://github.com/turion/rhine/pull/299.
Summary
The example application in
rhine-bayes
gets very slow quickly when increasing the number of particles in the filter. It seems that this is caused by a space leak. This leak might be in the customrunPopulationS
function, or (even worse) inrhine
ordunai
.Reproduce
Apply the following diff:
cabal run rhine-bayes-gloss --enable-profiling -- +RTS -hm
(feel free to change the RTS flags for other methods of profiling)1
or2
and Enter. A window should slowly appear, with some animations. These are fairly slow.hp2ps -d -c rhine-bayes-gloss.hp
For comparison, this is for 800 particles, suggesting linear complexity in the number of particles:
Preliminary analysis
It seems that for every inference step, a huge amount of thunks builds up only to be broken down immediately again. In this example, we had 400 particles and a peak memory usage of > 12 Mb, which means about 30 kb for a single particle. Even in the later parts when it's still > 4 MB of memory usage amplitude, it's 10 kb for a particle. The actual information represented by a particle is a couple of
Double
s and should be definitely way below 1 kb, even with the overhead of having a separateMSF
for every particle, I believe.Also, given that the memory usage rises and falls so steeply is typical of space leaks.
I've split up the memory usage by module (option
-hm
), which shows that several modules fromdunai
(such asData.MonadicStreamFunction.Core
are involved. This lead me to believe that maybe there is a space-leaking construction fromdunai
involved?But it's also possible that I'm reading this incorrectly and the
runPopulationS
construction is wrong. Running with-hc
instead of-hm
gives no clear hint, at least to me:Space leak in
arrM
?I applied the following diff to
dunai
:In words, I replaced the implementation for
arrM
with a simpler one that uses theMSF
constructor directly, but is very probably correct and not leaky.I pointed my
rhine-bayes
tree to that patcheddunai
and ran it again with 400 particles:It seems like one big contributor of the leak(s?) is gone. I tried replacing other usages of
morphGS
with explicit constructor usage, but that didn't change much. And anyways it points at different modules as the main space leaks now.Further analysis
@ivanperez-keera it seems that in this space leak,
dunai
is involved. I didn't open adunai
ticket because I don't have a clear reproducer there. But I believe that at leastarrM
is a culprit. Do you have other ideas how one might investigate this further?