kengz / SLM-Lab

Modular Deep Reinforcement Learning framework in PyTorch. Companion library of the book "Foundations of Deep Reinforcement Learning".
https://slm-lab.gitbook.io/slm-lab/
MIT License
1.25k stars 264 forks source link

Asynchronous SAC #404

Closed kengz closed 5 years ago

kengz commented 5 years ago

Feature / Fix

SAC has a very low FPS (frame-per-second), which is expected since it involves training 3 networks and syncing 2 networks at every step, though it is sample efficient. However, running SAC at millions of frames quickly becomes a problem since it would take weeks if not months to finish. For example, to run 100M frames at 10 FPS would take 115 days.

Fortunately in SLM Lab any algorithm can directly use Hogwild to parallelize asynchronously. There's of course some tradeoff between the number of workers (time savings) and performance. A more comprehensive study will be done later, but this benchmark below show asynchronous SAC works.

The non-humanoid environments are just for baseline comparison with the non-async version of SAC. The humanoid environments are the ones that would have taken weeks to run on SAC serially, but by parallelizing they each took only a day.

The frames in the graphs are per worker, and graphs are averaged across workers. To get the total frames, simply multiply the x-axis with the number' of sessions (workers).

Env. \ Alg. A3C (GAE) A3C (n-step) Async PPO Async SAC
RoboschoolAnt 2525.08
graph
RoboschoolAtlasForwardWalk 1849.50
graph
RoboschoolHalfCheetah 2278.03
graph
RoboschoolHopper 2376.96
graph
RoboschoolInvertedDoublePendulum 8030.81
graph
RoboschoolInvertedPendulum 966.41
graph
RoboschoolInvertedPendulumSwingup 847.06
graph
RoboschoolReacher 19.73
graph
RoboschoolWalker2d 1386.15
graph
RoboschoolHumanoid 2458.23
graph
RoboschoolHumanoidFlagrun 2056.06
graph
RoboschoolHumanoidFlagrunHarder 267.36
graph