commit all the specs for A2C, A3C, PPO, DQN. all working.
benchmark results will be uploaded separately when done
Fix PPO
fix PPO with proper epoch and split_minibatch. previous underperformance was due to insufficient training.
PPO FPS is now ~330, 3 times slower than A2C, which is correct since it trains for 4x more epochs, and training consumes about 70% of total process run time.
Commit benchmark specs
Fix PPO
split_minibatch
. previous underperformance was due to insufficient training.