issues
search
oguzserbetci
/
rl-teacher-atari
Code for Deep RL from Human Preferences [Christiano et al]. Plus a webapp for efficiently collecting human feedback.
MIT License
0
stars
0
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Implement a Bayesian Network
#5
oguzserbetci
opened
6 years ago
1
Sample multiple trajectories and select two to elicit human feedback
#4
oguzserbetci
opened
6 years ago
0
Figure out how the reward network is trained.
#3
oguzserbetci
opened
6 years ago
0
Reproduce results from the paper
#2
oguzserbetci
opened
6 years ago
2
Switch out the reward function with a bayesian network
#1
oguzserbetci
opened
6 years ago
1