Code for Deep RL from Human Preferences [Christiano et al]. Plus a webapp for efficiently collecting human feedback.
0
stars
0
forks
source link
Sample multiple trajectories and select two to elicit human feedback #4
Open
oguzserbetci opened 5 years ago