nottombrown / rl-teacher

Code for Deep RL from Human Preferences [Christiano et al]. Plus a webapp for collecting human feedback
MIT License
559 stars 95 forks source link

Collect snapshots of agent parameters to allow sharing of trained agents #23

Closed nottombrown closed 7 years ago

nottombrown commented 7 years ago

It seems like this should be easy to get working with PPO