Lookahead + RAdam optimizer significantly improves the performance of some RL algorithms (A2C (n-step), PPO) on continuous domain problems, but does not improve (A2C (GAE), SAC).
Methods
Implement RAdam and Lookahead optimizers
Update the optim_spec to replace Adam with Lookahead(RAdam) optimizer
We run benchmark to directly compare the performance differences between Adam and Lookahead(RAdam) using the same code and spec files, and only changing the optimizers (see the git diff of this PR). Due to limited computational resources, we focus the study on continuous environments from Roboschool.
We find that:
A2C (n-step), PPO gain significant improvements overall in both the standard Roboschool and the harder Humanoid environments.
A2C (n-step), previously failing completely on the harder Humanoid environments, is now able to learn
A2C (GAE) results are mixed, with some improvements and some degradation
SAC are not improved in all the environments, and we exclude the results from below. Instead, we provide a rerun/old benchmark result using Adam for comparison and benchmark update.
The results are tabulated below. In sum, the new results are run with:
A2C (GAE), A2C (n-step), PPO using Lookahead + RAdam optimizer
Experiment Result
Abstract
Lookahead + RAdam optimizer significantly improves the performance of some RL algorithms (A2C (n-step), PPO) on continuous domain problems, but does not improve (A2C (GAE), SAC).
Methods
optim_spec
to replaceAdam
withLookahead(RAdam)
optimizerTo Reproduce
Use this current commit e5988f04b2ca5935d253c0b16f01187e097e7a87 to run the spec files.
Results
All the results contributed will be added to the benchmark, and made publicly available on Dropbox.
We run benchmark to directly compare the performance differences between
Adam
andLookahead(RAdam)
using the same code and spec files, and only changing the optimizers (see the git diff of this PR). Due to limited computational resources, we focus the study on continuous environments from Roboschool.We find that:
Adam
for comparison and benchmark update.The results are tabulated below. In sum, the new results are run with:
Lookahead + RAdam
optimizerAdam
optimizerNew Roboschool benchmark
Legend:
graph
graph
graph
graph
graph
graph
graph
graph
Old Roboschool benchmark
graph
graph
graph
graph
graph
graph
graph
graph
New Humanoid benchmark
Humanoid environments are significantly harder. Note that due to the number of frames required, we could only run Async-SAC.
graph
graph
graph
Old Humanoid benchmark
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph
graph