Bug in updating the Hall of Fame - does not get updated with best individuals

TL;DR:

Fitness of individuals is decided by their rank in their generation but not by their rank across generations

In each generation after the training episodes are done we do the following:

For each individual unpack the result coming from the EpisodeRunner, this is a three-tuple consisting of (fitness, behavior_compressed, steps). For this issue fitness is of interest, the reward from the environment
fitness gets saved as an attribute to the individual: individual.fitness_orig
Then toolbox.shape_fitness(candidates) is called (candidates is a list of all individuals)
In toolbox.shape_fitness(candidates) the individuals are ranked according to their individual.fitness_orig. This is done by sorting them according to this value, then iterating through the sorted list and, starting by 1, assigning them their rank. The rank is then increased, so the second individual will get rank 2, and so on. So for example if a generation size of 150 is chosen, the individual with the highest individual.fitness_orig will get rank 150 (there is an edge case to this but this is not crucial to this issue). The rank is saved as individual.fitness.values which is used later by DEAP when updating the Hall of Fame to decide if this individual is better than an existing (higher rank -> better)

Lets consider an example where we have the first two generations. In the first generation we have individuals that create rewards around 10000. Then the Hall of Fame will consist of individuals that reach a reward of 10000. But if in the next generation for some reason the new individuals will only generate rewards of around 10, the Hall of Fame will be updated with these new individuals although they generated much lower reward. This is because they will still be ranked in their population and the best individual will have the same rank as the best individual of the first generation, although reaching a much lower reward. This rank is then used to update the Hall of Fame.

A simple fix would be to remove this reshaping of the fitness and just use the rewards from the environment as individual.fitness.values.

toolbox.shape_fitness(candidates) is self.shape_fitness_weighted_ranks in IOptimizer, the rest of the methods are in algorithms.py

The current implementation is based on the implementation from "Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning".

A simple fix would be to remove this reshaping of the fitness and just use the rewards from the environment

Some alternative ideas:

switch "fitness" and "fitness_orig" around (and maybe rename "fitness_orig" to "fitness_shaped") (and make sure optimizers explicitly use "fitness_shaped" for selection of individuals),
DEAPs fitness already supports multiple objectives. But I didn't look into the details on how to use implement it.
remove noveltysearch and multi-objective learning completely, as we don't need them at the moment (we will need it again, once our training runs into a local maximum, because simple, but strong strategies emerge, which the optimizers can't escape)
implement a custom HOF, which we need for https://github.com/neuroevolution-ai/NeuroEvolution-CTRNN_new/issues/78 anyways

neuroevolution-ai / NeuroEvolution-CTRNN_new

Bug in updating the Hall of Fame - does not get updated with best individuals #85