oguzserbetci / rl-teacher-atari

Code for Deep RL from Human Preferences [Christiano et al]. Plus a webapp for efficiently collecting human feedback.
MIT License
0 stars 0 forks source link

Reproduce results from the paper #2

Open oguzserbetci opened 5 years ago

oguzserbetci commented 5 years ago

We should be able to attain:

oguzserbetci commented 5 years ago

The code doesn't reproduce the paper.

oguzserbetci commented 5 years ago

Initial results:

image image image