OutOfMemory with large number of episodes

wasowski commented 2 years ago

The stack overflow problems are solved (issue #53), but we still have OOM problems, at about 400K episodes with simplemaze. This most likely means that we stick to some memory held by the Randomized schedule, instead of consuming this memory on the fly when learning.

Interestingly there is no memory leaks with UnitAgent - this could be because the Q matrix is never updated, or it is updated very few times. This could also be because it is not learn but learningEpisode that leaks via iterateUntilM (and this does not show because all episodes terminate immediately for UnitAgent). It could be good to start by devising two tests that check these hypotheses.

Another hypothesis to be tested:

val C = 555555
val l = from (1). take (C) // a very long lazy list. not forced
def f (l: LazyList[Int], acc: Int): Int = 
   if l.isEmpty then return acc
   else f (l.tail, acc + l.head)
f (l)

In principle, f consumes the list, and does not need the elements in memory, but head of l in the caller is referenced from a named value in the caller, which should prevent the garbage collector from freeing memory. For a sufficiently large 'C' we should run out of heap space. I doubt that calling directly f(from(1).take(C)) would make any difference - but it is worth checking.

wasowski commented 2 years ago

Page 134 in fpinscala book has an indirect inspiration: try to avoid passing around the list as a pointer to the head but instead embed the calculation of the function you pass it to in a map, or other suitable operator (fold?) which is lazy. It seems that the LazyList programming is really not very good if you have to do more than one expression with the list. Ideally all list elements, including the head should be collectable temporaries.

wasowski commented 8 months ago

PR #232 advances the performance quite a bit, but we are far from not using memory at all. But we might be getting close to what an imperative, say Python, program would use if implemented following the same high level pattern of learning and storing policies first, then evaluating later. Some directions to push it further:

Obtain a fusion between learning and evaluation, so that evaluation is effectively interleaved with learning. This would release more memory earlier.
Implement imperative observers as a trait that you can mix into the learning class, which can be used to save monitoring results off main memory (some kind of => Unit observer).
Of course it is interesting whether any observer from state to any time A could also be supported (so that it could be pure) but fused with learning (the idea 1 and 2 combined).
Understand how Map is implemented in Scala, and what we can optimize to use this representation better. Or whether another representation would serve us better.
Optimize the state space representation in each RL problem (this can give a constant space improvement)
Try whether it is possible to remove probula and implement Randomized3 using spire directly. This should reduce space overhead by a constant factor too.

itu-square / symsim

OutOfMemory with large number of episodes #59