Closed vlad17 closed 7 years ago
Find 2-3 candidates (W/ existing good 1GPU impls), pro/con, choose best, make running pong example.
Consider training speed too. ACKTR/CPO/DDPG might be faster real time training than dqn
PPO baseline here: https://github.com/mwhittaker/deeprl_project/pull/11
Find 2-3 candidates (W/ existing good 1GPU impls), pro/con, choose best, make running pong example.