oguzserbetci / rl-teacher-atari

Code for Deep RL from Human Preferences [Christiano et al]. Plus a webapp for efficiently collecting human feedback.
MIT License
0 stars 0 forks source link

Switch out the reward function with a bayesian network #1

Open oguzserbetci opened 5 years ago

oguzserbetci commented 5 years ago
oguzserbetci commented 5 years ago

So I found the place and marked it with a TODO in code: rl_teacher/reward_models.py:129

It is a simple regression MLP that maps (state, action) -> reward