Closed schrum2 closed 5 years ago
We now have a proper PyTorch implementation running alongside NSGA-II, but the values and some other characteristics of its execution are still a bit wonky.
Prioritize finding the cause of the pausing, but also start to gradually remove any unnecessary code. In particular, any leftover code associated with the NEAT networks and population should be gradually removed. Do this one small step at a time, with frequent commits along the way.
Found the culprit. Line 225:
rollouts.compute_returns(next_value, use_gae=False, gamma=0.99, gae_lambda=None, use_proper_time_limits=True)
The code pauses about every eight seconds because that is the amount of time it takes to complete 128 steps, which we set num_steps
in the second loop to. Now, I set these values practically for the hell of it; I'm not confident just yet on what would be optimal for PPO. compute_returns
is a method in storage.py that essentially generates a loop in the following manner:
if use_proper_time_limits:
if use_gae: loop 1 (`gae` modified for `bad_masks`)
else: loop 2 (`gae` modified for `bad_masks`)
else:
if use_gae: loop 1
else: loop 2
Running and rendering the original code also has it pause every 128 steps, which makes me inclined to think that it's not meant for rendering. Still, how can we optimize this?
So I've cleaned up a lot of code that has to do with NEAT but still kept the loop intact - will be addressing that soon after lunch. Running without rendering, I get this error:
Traceback (most recent call last):
File "NSGAII.py", line 298, in <module>
fitness, behavior_char = evaluate(envs,net,actor_critic)
File "NSGAII.py", line 145, in evaluate
ob = envs.reset()
File "C:\Users\Admin\Desktop\Southwestern\SCOPE\Files\Repo\gym-http-api\NSGA2\helpers\envs.py", line 257, in reset
obs = self.venv.reset()
File "C:\Users\Admin\Desktop\Southwestern\SCOPE\Files\Repo\gym-http-api\NSGA2\helpers\envs.py", line 183, in reset
obs = self.venv.reset()
File "c:\users\admin\desktop\southwestern\scope\files\repo\baselines\baselines\common\vec_env\dummy_vec_env.py", line 60, in reset
obs = self.envs[e].reset()
File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\gym\core.py", line 308, in reset
observation = self.env.reset(**kwargs)
File "c:\users\admin\desktop\southwestern\scope\files\repo\baselines\baselines\bench\monitor.py", line 36, in reset
self.reset_state()
File "c:\users\admin\desktop\southwestern\scope\files\repo\baselines\baselines\bench\monitor.py", line 46, in reset_state
raise RuntimeError("Tried to reset an environment before done. If you want to allow early resets, wrap your env with Monitor(env, path, allow_early_resets=True)")
RuntimeError: Tried to reset an environment before done. If you want to allow early resets, wrap your env with Monitor(env, path, allow_early_resets=True)
I'm going to print within the main
loop to see if we're calling another evaluate by any chance; if not, I'll probably have to wrap the environment.
Wrapping the environment seems wrong. I think that envs.reset() should only be called at the very start of evaluate ... anywhere else would be wrong, and this seemed to be working before.
Got behavior characterization and cumulative reward to work, but the episode seems to crash with that once more. Next step is to ensure that the code will run in succession with several episodes.
I'm going to declare this issue solved, but shift some of the unresolved issues to #24
To help us complete #16 you need to complete this sub-issue. Rather than change the existing PPO code, it might make sense to copy it and modify the copy instead.
We need code that will use PPO and learn with it up until Sonic dies OR some time limit is reached (we may add other restrictions as well), but at that stopping point the method will return two things: 1) The overall return (sum of rewards) or basically some form of objective fitness score, AND 2) The behavior characterization for Novelty Search (this is the more important one). This should be an ordered list of all the (x,y) coordinates Sonic was located at through the course of evaluation.
Important thing to check: