oguzserbetci / rl-teacher-atari

Code for Deep RL from Human Preferences [Christiano et al]. Plus a webapp for efficiently collecting human feedback.
MIT License
0 stars 0 forks source link

Sample multiple trajectories and select two to elicit human feedback #4

Open oguzserbetci opened 5 years ago