kengz / SLM-Lab

Modular Deep Reinforcement Learning framework in PyTorch. Companion library of the book "Foundations of Deep Reinforcement Learning".
https://slm-lab.gitbook.io/slm-lab/
MIT License
1.24k stars 264 forks source link

Add Lookahead+RAdam optimizer #416

Closed kengz closed 4 years ago

kengz commented 4 years ago

Experiment Result

Abstract

Lookahead + RAdam optimizer significantly improves the performance of some RL algorithms (A2C (n-step), PPO) on continuous domain problems, but does not improve (A2C (GAE), SAC).

Methods

  1. Implement RAdam and Lookahead optimizers
  2. Update the optim_spec to replace Adam with Lookahead(RAdam) optimizer
  3. Run the benchmark

Implementations inspired by/adapted from LiyuanLucasLiu/RAdam, lonePatient/lookahead_pytorch, and Less Wright's Medium article.

To Reproduce

Use this current commit e5988f04b2ca5935d253c0b16f01187e097e7a87 to run the spec files.

Results

All the results contributed will be added to the benchmark, and made publicly available on Dropbox.

We run benchmark to directly compare the performance differences between Adam and Lookahead(RAdam) using the same code and spec files, and only changing the optimizers (see the git diff of this PR). Due to limited computational resources, we focus the study on continuous environments from Roboschool.

We find that:

The results are tabulated below. In sum, the new results are run with:

New Roboschool benchmark

Legend:

legend
Env. \ Alg. A2C (GAE) A2C (n-step) PPO SAC
RoboschoolAnt
graph
787 1396 1843 2915
RoboschoolAtlasForwardWalk
graph
59.87 88.04 172 800
RoboschoolHalfCheetah
graph
712 439 1960 2497
RoboschoolHopper
graph
710 285 2042 2045
RoboschoolInvertedDoublePendulum
graph
996 4410 8076 8085
RoboschoolInvertedPendulum
graph
995 978 986 941
RoboschoolReacher
graph
12.9 10.16 19.51 19.99
RoboschoolWalker2d
graph
280 220 1660 1894

Old Roboschool benchmark

Env. \ Alg. A2C (GAE) A2C (n-step) PPO SAC
RoboschoolAnt
graph
1029.51 1148.76 1931.35 2914.75
RoboschoolAtlasForwardWalk
graph
68.15 73.46 148.81 942.39
RoboschoolHalfCheetah
graph
895.24 409.59 1838.69 2496.54
RoboschoolHopper
graph
286.67 -187.91 2079.22 2251.36
RoboschoolInvertedDoublePendulum
graph
1769.74 486.76 7967.03 8085.04
RoboschoolInvertedPendulum
graph
1000.0 997.54 930.29 941.45
RoboschoolReacher
graph
14.57 -6.18 19.18 19.99
RoboschoolWalker2d
graph
413.26 141.83 1368.25 1894.05

New Humanoid benchmark

Humanoid environments are significantly harder. Note that due to the number of frames required, we could only run Async-SAC.

Env. \ Alg. A2C (GAE) A2C (n-step) PPO Async-SAC
RoboschoolHumanoid
graph
99.31 54.58 2388 2621
RoboschoolHumanoidFlagrun
graph
73.57 178 2014 2056
RoboschoolHumanoidFlagrunHarder
graph
-429 253 680 280

Old Humanoid benchmark

Env. \ Alg. A2C (GAE) A2C (n-step) PPO Async-SAC
RoboschoolHumanoid 122.23
graph
-6029.02
graph
1554.03
graph
2621.46
graph
RoboschoolHumanoidFlagrun 93.48
graph
-2079.02
graph
1635.64
graph
1937.77
graph
RoboschoolHumanoidFlagrunHarder -472.34
graph
-24620.71
graph
610.09
graph
280.18
graph