Closed eugenevinitsky closed 6 years ago
Cartpole does work under similar configs so maybe it's just a tuning issue?
I can confirm Pendulum doesn't seem to train. This is actually true for A3C, ES, and plain PG, which may indicate there's a common issue, perhaps the action distribution or default params for the network architecture.
Other envs seem to work fine though (e.g. Humanoid, Cartpole, Pong). I don't think we've ever tested on Pendulum-v0 so this may have always been an issue.
@richardliaw found some hyperparams that worked: ./train.py --env=Pendulum-v0 --run=PPO --config='{"timesteps_per_batch": 2048, "lambda": 0.1, "gamma": 0.95, "sgd_stepsize": 0.0003, "sgd_batchsize": 64, "num_sgd_iter": 10, "model": {"fcnet_hiddens": [64, 64]}, "min_steps_per_task": 100}'
I suspect the discount in particular might make a big difference here. Learning curve:
Thanks @richardliaw! We were concerned because some of our more complicated single-agent experiments that worked on our reinforcement learning libraries do not seem to be learning (albeit we were using TRPO for those) and so when pendulum wasn't working we were worried there might be a more fundamental problem.
System information
Describe the problem
Pendulum doesn't appear to be learning. The following script doesn't show any improvement after 100 iterations.
Source code / logs