nottombrown / rl-teacher

Code for Deep RL from Human Preferences [Christiano et al]. Plus a webapp for collecting human feedback
MIT License
559 stars 95 forks source link

Add PPO #4

Closed nottombrown closed 7 years ago

nottombrown commented 7 years ago

I expect PPO to train significantly faster than TRPO https://github.com/openai/baselines/tree/master/baselines/pposgd

nottombrown commented 7 years ago

Working on this now.

nottombrown commented 7 years ago

Added in #11