rl-teacher-atari
is an extension of rl-teacher
, which is in turn an implementation of of Deep Reinforcement Learning from Human Preferences [Christiano et al., 2017].
As-is, rl-teacher
only handles MuJoCo environments. This repository is meant to extend that functionality to Atari environments and other complex Gym environments. Additionally, this repository extends and augments the code in the following ways:
GA3C
agent to optimize Atari and other complex environmentsparallel_trpo
to theoretically be able to handle environments with discrete action spacesGA3C
+database)human-feedback-api
much more efficient by having humans sort clips into a red-black tree instead of doing blind comparisonsparallel-trpo
, and adding the ability to define custom start-points in an Atari environmentThe setup instructions are identical to rl-teacher
except that you no longer need to set up MuJoCo unless you are trying to run MuJoCo environments, and you no longer need to install agents that are unused.
To run Atari specifically, use
cd ~/rl-teacher-atari
pip install -e .
pip install -e human-feedback-api
pip install -e agents/ga3c
To run rl-teacher-atari
, use the same sorts of commands that you'd use for rl-teacher
.
Examples:
python rl_teacher/teach.py -e Pong-v0 -n rl-test -p rl
python rl_teacher/teach.py -e Breakout-v0 -n synth-test -p synth -l 300
python rl_teacher/teach.py -e MontezumaRevenge-v0 -n human-test -p human -L 50
Note that with rl-teacher-atari
you'll need far fewer labels.
You'll also want to switch the agent back to parallel_trpo
for solving MuJoCo environments.
python rl_teacher/teach.py -p rl -e ShortHopper-v1 -n base-rl -a parallel_trpo
There are a few new command-line arguments that are worth knowing about. Primarily, there are a set of four flags:
--force_new_environment_clips
--force_new_training_labels
--force_new_reward_model
--force_new_agent_model
Activating these flags will erase the corresponding data from the disk/database. For the most part this won't be necessary, and you can simply pick a new experiment name. Note, however, that experiments within the same environment now share clips so you may want to --force_new_environment_clips
when starting a new experiment in an old environment.Also worth noting, there's a parameter called --stacked_frames
(-f
) that defaults to 4. This helps model movement that the human naturally sees in the video, but can alter how the system performs compared to rl-teacher
. To remove frame stacking simply add -f 0
to the command-line arguments.
rl-teacher-atari
is meant to be entirely backwards compatible, and do at least as well as rl-teacher
on all tasks. If rl-teacher-atari
lacks a feature that its parent has, please submit an issue.