EngineAlgorithm should cleanup itself after it stopped

HeuristicLab-Trac-Bot commented 13 years ago

Issue migrated from trac ticket # 1355

milestone: HeuristicLab 3.3.3 | component: Optimization | priority: high | resolution: done

2010-12-30 01:32:54: @discostu105 created the issue

When an EngineAlgorithm stops, it currently keeps all its state-information (ExecutionContext and ScopeTree). This is unneccessary because when it is in ExecutionState "Stopped", it cannot be resumed (without calling Prepare, which causes the deletion of this state-information).

Problem: The state-information consumes a large amount of memory when beeing serialized, especially when the population size is large in evolutionary algorithms. This is a significant problem in Hive. When a finished algorithm contains tens or hundrets of megabytes of unnecessary information serialization, deserialization and transmission of jobs is unnecessarily slow.

Also, users who don't know the "trick" of calling Prepare before storing an algorithm, spend lots of unneccessary time on saving and loading their algorithms.

Just calling Prepare() when an algorithm stopped however would be too much, since this would cause the ExecutionTime and the ResultCollection beeing cleaned. It just would not fit the semantics of Prepare to use it for memory cleanup.

HeuristicLab-Trac-Bot commented 13 years ago

2010-12-30 01:33:27: @discostu105 changed status from new to assigned

HeuristicLab-Trac-Bot commented 13 years ago

2011-01-03 01:14:59: @s-wagner changed status from assigned to accepted

HeuristicLab-Trac-Bot commented 13 years ago

2011-01-03 01:14:59: @s-wagner changed milestone from HeuristicLab x.x.x to HeuristicLab 3.3.3

HeuristicLab-Trac-Bot commented 13 years ago

2011-01-03 01:27:45: @s-wagner commented

Implemented that the ExecutionStack (and consequently also all ExecutionContexts) is cleared in Engine, if the engine is stopped (see ticket #1333 and r5193).

HeuristicLab-Trac-Bot commented 13 years ago

2011-01-03 01:31:09: @s-wagner changed component from ### Undefined ### to Optimization

HeuristicLab-Trac-Bot commented 13 years ago

2011-01-03 01:31:09: @s-wagner edited the issue description

HeuristicLab-Trac-Bot commented 13 years ago

2011-01-03 01:31:09: @s-wagner changed title from Algorithm cleanup itself after it stopped to EngineAlgorithm should cleanup itself after it stopped

HeuristicLab-Trac-Bot commented 13 years ago

2011-01-03 05:43:58: @s-wagner commented

Cleared global scope after an EngineAlgorithm is stopped in r5195.

HeuristicLab-Trac-Bot commented 13 years ago

2011-01-03 05:48:21: @s-wagner commented

Additionally, I think we should get rid of storing a clone of the algorithm in each run. This also requires a lot of memory which probably is not really worth it. I think that the feature to show the corresponding algorithm for a run is not really frequently used.

HeuristicLab-Trac-Bot commented 13 years ago

2011-01-03 08:50:18: @discostu105 commented

I agree that storing a clone of the algorithm in each run can become a huge memory-overhead. But maybe it would be enough to just make it an opt-in feature, rather than opt-out.

HeuristicLab-Trac-Bot commented 13 years ago

2011-01-03 09:52:06: @discostu105 commented

The following test shows that there may still be a problem with file sizes and memory consumption after the recent changes.

Algorithm: Genetic Algorithm (MaxGenerations: 15, PopulationSize: 2000)

Problem: Single Objective Testfunction (ProblemSize: 2000)

StoreAlgorithmInEachRun: false

The following numbers show file-sizes when the algorihm is stored:

Before algorithm started: 37KB

After algorithm paused: 97MB

After algorithm stopped: 73MB

After algorithm prepared: 36MB

After clearing runs: 38KB!

The recent changes help to release ~24MB of memory when the algorithm stops, yet ~37MB can still be released by calling "prepare".

Also interesting and maybe a different problem is that by clearing the ResultCollection (which contained only 1 run), almost ~36MB of memory were released. I further investigated this issue and found that one problem might be the a SingleObjectiveTestFunctionSolution object contains a reference to the whole population. Therefore the BestSolution in a run also contains a reference to the whole population which in this case leads to a large memory consumption.

HeuristicLab-Trac-Bot commented 13 years ago

2011-01-03 22:20:20: @s-wagner commented

Changed storing the algorithm in each run into an opt-in feature in r5203.

HeuristicLab-Trac-Bot commented 13 years ago

2011-01-03 23:17:36: @s-wagner commented

Replying to [comment:8 cneumuel]:

The following test shows that there may still be a problem with file sizes and memory consumption after the recent changes.

Algorithm: Genetic Algorithm (MaxGenerations: 15, PopulationSize: 2000)

Problem: Single Objective Testfunction (ProblemSize: 2000)

StoreAlgorithmInEachRun: false

The following numbers show file-sizes when the algorihm is stored:

Before algorithm started: 37KB

After algorithm paused: 97MB

After algorithm stopped: 73MB

After algorithm prepared: 36MB

After clearing runs: 38KB!

The recent changes help to release ~24MB of memory when the algorithm stops, yet ~37MB can still be released by calling "prepare".

Also interesting and maybe a different problem is that by clearing the ResultCollection (which contained only 1 run), almost ~36MB of memory were released. I further investigated this issue and found that one problem might be the a SingleObjectiveTestFunctionSolution object contains a reference to the whole population. Therefore the BestSolution in a run also contains a reference to the whole population which in this case leads to a large memory consumption.

Yes, the large file sizes of case 3 and 4 result from storing the whole population in the SingleObjectiveTestFunctionSolution and consequently also in the BestSolution result (which is contained in the results as well as in each run). We decided to store the whole population in order to be able to visualize the whole population graphically. However, this is only possible in the 2D case. If the problem dimension is larger than 2, the graphical solution visualization is not available. Therefore, we should only store the whole population in the 2D case. I created a separate ticket #1360 for this issue.

HeuristicLab-Trac-Bot commented 13 years ago

2011-01-03 23:37:23: @s-wagner commented

Updated samples in r5205.

HeuristicLab-Trac-Bot commented 13 years ago

2011-01-04 00:54:16: @s-wagner changed status from accepted to reviewing

HeuristicLab-Trac-Bot commented 13 years ago

2011-01-04 00:54:16: @s-wagner changed owner from swagner to cneumuel

HeuristicLab-Trac-Bot commented 13 years ago

2011-01-04 02:27:25: @discostu105 commented

Tested the same algorithm with the recent changes:

Before algorithm started: 37KB

After algorithm paused: 105MB

After algorithm stopped: 84KB

After algorithm prepared: 64KB

After clearing runs: 38KB

HeuristicLab-Trac-Bot commented 13 years ago

2011-01-04 02:27:46: @discostu105 changed status from reviewing to readytorelease

HeuristicLab-Trac-Bot commented 13 years ago

2011-01-04 02:27:46: @discostu105 changed owner from cneumuel to swagner

HeuristicLab-Trac-Bot commented 13 years ago

2011-02-05 19:05:17: @mkommend changed status from readytorelease to closed

HeuristicLab-Trac-Bot commented 13 years ago

heal-research / HeuristicLab