Closed vishiswoz closed 7 years ago
Can you figure out how, for the same genome, how the monitored version behaves differently? Does it return done=True earlier, or different observations, or different rewards?
I think I figured it out: I was making the timestep equal to 1000 earlier, that's why it behaved differently: it took a lot longer for the desired average fitness of 195 to come when I changed the timestep to 200.
Any ideas how to make the GA faster if I use a smaller timestep? Cause it's taking a very long time.
GAs parallelize very well. You can run each member of the population on a separate core for a 14x speedup. Python's processing module can do this on a single machine.
Thanks for the tip, I actually figured out why it was taking so long, I was implementing the selection criterion wrong. Once I fixed it, my code went from running for 3 hours and getting an average fitness of 70 to getting the desired average fitness of 195.0 after 30-60 seconds. You can view it here: https://gym.openai.com/evaluations/eval_vNeUzkELQWyqq1zBaL9KAg. I'll try and upload the code as a gist in the next couple of hours.
This is my current code with monitoring, however, the while loop will never terminate,
avg_reward
will be stuck between 60 and 70.However, if I comment out the lines
env = wrappers.Monitor(env, '/tmp/cartpole-experiment-1', force=True)
gym.upload('/tmp/cartpole-experiment-1', api_key='sk_8lZfIkSRLSgKfmIAUomg')
which basically remove the monitor, my while loop terminates in about 4500-5000 episodes, yet with the monitor my while loop will not terminate at those episodes, what exactly is the issue?