Reproduce results from the paper

oguzserbetci / rl-teacher-atari

Code for Deep RL from Human Preferences [Christiano et al]. Plus a webapp for efficiently collecting human feedback.

MIT License

0 stars 0 forks source link

Open oguzserbetci opened 6 years ago

oguzserbetci commented 6 years ago

We should be able to attain:

oguzserbetci commented 6 years ago

The code doesn't reproduce the paper.

oguzserbetci commented 6 years ago

Initial results: