Open pdeubel opened 3 years ago
The current implementation is based on the implementation from "Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning".
A simple fix would be to remove this reshaping of the fitness and just use the rewards from the environment
Some alternative ideas:
TL;DR:
Fitness of individuals is decided by their rank in their generation but not by their rank across generations
In each generation after the training episodes are done we do the following:
EpisodeRunner
, this is a three-tuple consisting of(fitness, behavior_compressed, steps
). For this issuefitness
is of interest, the reward from the environmentfitness
gets saved as an attribute to the individual:individual.fitness_orig
toolbox.shape_fitness(candidates)
is called (candidates
is a list of all individuals)toolbox.shape_fitness(candidates)
the individuals are ranked according to theirindividual.fitness_orig
. This is done by sorting them according to this value, then iterating through the sorted list and, starting by 1, assigning them their rank. The rank is then increased, so the second individual will get rank 2, and so on. So for example if a generation size of 150 is chosen, the individual with the highestindividual.fitness_orig
will get rank 150 (there is an edge case to this but this is not crucial to this issue). The rank is saved asindividual.fitness.values
which is used later by DEAP when updating the Hall of Fame to decide if this individual is better than an existing (higher rank -> better)Lets consider an example where we have the first two generations. In the first generation we have individuals that create rewards around
10000
. Then the Hall of Fame will consist of individuals that reach a reward of10000
. But if in the next generation for some reason the new individuals will only generate rewards of around10
, the Hall of Fame will be updated with these new individuals although they generated much lower reward. This is because they will still be ranked in their population and the best individual will have the same rank as the best individual of the first generation, although reaching a much lower reward. This rank is then used to update the Hall of Fame.A simple fix would be to remove this reshaping of the fitness and just use the rewards from the environment as
individual.fitness.values
.toolbox.shape_fitness(candidates)
isself.shape_fitness_weighted_ranks
inIOptimizer
, the rest of the methods are inalgorithms.py