nottombrown / rl-teacher

Code for Deep RL from Human Preferences [Christiano et al]. Plus a webapp for collecting human feedback
MIT License
556 stars 93 forks source link

Return a deepcopy of our split episodes #22

Closed nottombrown closed 6 years ago

nottombrown commented 6 years ago

Previously we were training off of slices of our episode data, which were changing beneath us

See change in loss here: image