issues
search
nottombrown
/
rl-teacher
Code for Deep RL from Human Preferences [Christiano et al]. Plus a webapp for collecting human feedback
MIT License
559
stars
93
forks
source link
Add PPO
#11
Closed
nottombrown
closed
7 years ago
nottombrown
commented
7 years ago
This does a few things
Adds PPOSGD
Renames
path['action']
to
path['actions']
Fixes a bug with short environment prefixes not being respected
This does a few things
path['action']
topath['actions']