nazaruka / gym-http-api

NSGA2-based Sonic agent + experimental code
MIT License
1 stars 1 forks source link

Behavioral Diversity via NSGA-II #24

Closed schrum2 closed 5 years ago

schrum2 commented 5 years ago

Instead of using Novelty Search, we could use Behavioral Diversity. This approach relies on the use of a multiobjective evolutionary algorithm, usually NSGA-II, which has a Python implementation: https://github.com/ChengHust/NSGA-II Specifically, there is one actual fitness objective, and a second novelty objective, that is similar to the novelty score in Novelty Search. The extra benefit of this approach is that no archive maintenance is needed.

Behavioral Diversity was introduced by Jean Baptiste Mouret: https://www.researchgate.net/publication/221006307_Behavioral_diversity_measures_for_Evolutionary_Robotics

I've also made use of it in my own previous work: https://suscholar.southwestern.edu/handle/11214/150

schrum2 commented 5 years ago

The first problem we need to address with the code currently in dev_schrum is that, after one evaluation, the code can't start an evaluation of the next agent because of an issue with resetting the environment. Here is the error:

Traceback (most recent call last):
  File ".\NSGAII.py", line 304, in <module>
    fitness, behavior_char = evaluate(envs,net,actor_critic)
  File ".\NSGAII.py", line 192, in evaluate
    obs = envs.reset()
  File "E:\Users\he_de\workspace\gym-http-api\NSGA2\helpers\envs.py", line 257, in reset
    obs = self.venv.reset()
  File "E:\Users\he_de\workspace\gym-http-api\NSGA2\helpers\envs.py", line 183, in reset
    obs = self.venv.reset()
  File "e:\users\he_de\pythondeeplearning\baselines\baselines\common\vec_env\dummy_vec_env.py", line 60, in reset
    obs = self.envs[e].reset()
  File "C:\ProgramData\Anaconda3\lib\site-packages\gym\core.py", line 277, in reset
    observation = self.env.reset(**kwargs)
  File "e:\users\he_de\pythondeeplearning\baselines\baselines\bench\monitor.py", line 36, in reset
    self.reset_state()
  File "e:\users\he_de\pythondeeplearning\baselines\baselines\bench\monitor.py", line 46, in reset_state
    raise RuntimeError("Tried to reset an environment before done. If you want to allow early resets, wrap your env with Monitor(env, path, allow_early_resets=True)")
RuntimeError: Tried to reset an environment before done. If you want to allow early resets, wrap your env with Monitor(env, path, allow_early_resets=True)

I really don't think we should have to wrap the environment to get the desired affect, since this wasn't required with the NEAT version of the code. However, I suppose if wrapping the environment allows us to do a reset with no problems, we should just do it. So, try wrapping the environment as indicated and move forward.

Once this part of the issue is resolved, we can focus on actually evolving genomes, but this requires #25 to be solved first.

nazaruka commented 5 years ago

Half of the issue, namely the one dealing with proper episode resetting, is taken care of by manipulating the following command:

envs = make_vec_envs("SonicTheHedgehog-Genesis", seed=1, num_processes=1, 
                       gamma=0.99, log_dir='/tmp/gym/', device=device, allow_early_resets=False)

We merely set allow_early_resets to True, and we're off to the races. Now, our main objective lies in getting genomes to evolve and PPO to learn - scaling rewards is on the table as an idea

schrum2 commented 5 years ago

Original NSGA code is still calling function1 and function2. We need to replace these, and have a general method for evaluating the population that can be called in two placed. Basically need to replace this code:

        solution2 = solution[:]
        # Generating offsprings
        while len(solution2) != 2*pop_size:
            a1 = random.randint(0, pop_size-1)
            b1 = random.randint(0, pop_size-1)
            solution2.append(crossover(solution[a1], solution[b1]))
        function1_values2 = [function1(solution2[i])for i in range(0, 2*pop_size)]
        function2_values2 = [function2(solution2[i])for i in range(0, 2*pop_size)]
        non_dominated_sorted_solution2 = fast_non_dominated_sort(function1_values2[:], function2_values2[:])

And maybe some code around it

schrum2 commented 5 years ago

I did a lot of changes to the code so that it runs through several generations and logs and reports scores. Granted, the agents that are created are still not created from genomes as required in issue #25 . However, everything else is in place, so I'm actually going to close this issue.