nazaruka / gym-http-api

NSGA2-based Sonic agent + experimental code
MIT License
1 stars 1 forks source link

Run one Sonic episode and return behavior characterization and return #23

Closed schrum2 closed 5 years ago

schrum2 commented 5 years ago

To help us complete #16 you need to complete this sub-issue. Rather than change the existing PPO code, it might make sense to copy it and modify the copy instead.

We need code that will use PPO and learn with it up until Sonic dies OR some time limit is reached (we may add other restrictions as well), but at that stopping point the method will return two things: 1) The overall return (sum of rewards) or basically some form of objective fitness score, AND 2) The behavior characterization for Novelty Search (this is the more important one). This should be an ordered list of all the (x,y) coordinates Sonic was located at through the course of evaluation.

Important thing to check:

nazaruka commented 5 years ago

We now have a proper PyTorch implementation running alongside NSGA-II, but the values and some other characteristics of its execution are still a bit wonky.

schrum2 commented 5 years ago

Prioritize finding the cause of the pausing, but also start to gradually remove any unnecessary code. In particular, any leftover code associated with the NEAT networks and population should be gradually removed. Do this one small step at a time, with frequent commits along the way.

nazaruka commented 5 years ago

Found the culprit. Line 225: rollouts.compute_returns(next_value, use_gae=False, gamma=0.99, gae_lambda=None, use_proper_time_limits=True)

The code pauses about every eight seconds because that is the amount of time it takes to complete 128 steps, which we set num_steps in the second loop to. Now, I set these values practically for the hell of it; I'm not confident just yet on what would be optimal for PPO. compute_returns is a method in storage.py that essentially generates a loop in the following manner:

if use_proper_time_limits:
   if use_gae: loop 1 (`gae` modified for `bad_masks`)
   else: loop 2 (`gae` modified for `bad_masks`)
else:
   if use_gae: loop 1
   else: loop 2

Running and rendering the original code also has it pause every 128 steps, which makes me inclined to think that it's not meant for rendering. Still, how can we optimize this?

nazaruka commented 5 years ago

So I've cleaned up a lot of code that has to do with NEAT but still kept the loop intact - will be addressing that soon after lunch. Running without rendering, I get this error:

Traceback (most recent call last):
  File "NSGAII.py", line 298, in <module>
    fitness, behavior_char = evaluate(envs,net,actor_critic)
  File "NSGAII.py", line 145, in evaluate
    ob = envs.reset()
  File "C:\Users\Admin\Desktop\Southwestern\SCOPE\Files\Repo\gym-http-api\NSGA2\helpers\envs.py", line 257, in reset
    obs = self.venv.reset()
  File "C:\Users\Admin\Desktop\Southwestern\SCOPE\Files\Repo\gym-http-api\NSGA2\helpers\envs.py", line 183, in reset
    obs = self.venv.reset()
  File "c:\users\admin\desktop\southwestern\scope\files\repo\baselines\baselines\common\vec_env\dummy_vec_env.py", line 60, in reset
    obs = self.envs[e].reset()
  File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\gym\core.py", line 308, in reset
    observation = self.env.reset(**kwargs)
  File "c:\users\admin\desktop\southwestern\scope\files\repo\baselines\baselines\bench\monitor.py", line 36, in reset
    self.reset_state()
  File "c:\users\admin\desktop\southwestern\scope\files\repo\baselines\baselines\bench\monitor.py", line 46, in reset_state
    raise RuntimeError("Tried to reset an environment before done. If you want to allow early resets, wrap your env with Monitor(env, path, allow_early_resets=True)")
RuntimeError: Tried to reset an environment before done. If you want to allow early resets, wrap your env with Monitor(env, path, allow_early_resets=True)

I'm going to print within the main loop to see if we're calling another evaluate by any chance; if not, I'll probably have to wrap the environment.

schrum2 commented 5 years ago

Wrapping the environment seems wrong. I think that envs.reset() should only be called at the very start of evaluate ... anywhere else would be wrong, and this seemed to be working before.

nazaruka commented 5 years ago

Got behavior characterization and cumulative reward to work, but the episode seems to crash with that once more. Next step is to ensure that the code will run in succession with several episodes.

schrum2 commented 5 years ago

I'm going to declare this issue solved, but shift some of the unresolved issues to #24