A collection of Deep Reinforcement Learning algorithms implemented with PyTorch to solve Atari games and classic control tasks like CartPole, LunarLander, and MountainCar.
The overall project structure was based on DeepMind's DQN Zoo. We adapted the code to support PyTorch, in addition also implemented some SOTA algorithms like PPO, RND, R2D2, and Agent57.
Directory | Reference Paper | Note |
---|---|---|
reinforce |
Policy Gradient Methods for RL | * |
reinforce_baseline |
Policy Gradient Methods for RL | * |
actor_critic |
Actor-Critic Algorithms | * |
a2c |
Asynchronous Methods for Deep Reinforcement Learning | synchronous, deterministic variant of A3C | P |
sac |
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning | Soft Actor-Critic for Discrete Action Settings | P * |
ppo |
Proximal Policy Optimization Algorithms | P |
ppo_icm |
Curiosity-driven Exploration by Self-supervised Prediction | P |
ppo_rnd |
Exploration by Random Network Distillation | P |
impala |
IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures | P |
Directory | Reference Paper | Note |
---|---|---|
dqn |
Human Level Control Through Deep Reinforcement Learning | |
double_dqn |
Deep Reinforcement Learning with Double Q-learning | |
prioritized_dqn |
Prioritized Experience Replay | |
drqn |
Deep Recurrent Q-Learning for Partially Observable MDPs | * |
r2d2 |
Recurrent Experience Replay in Distributed Reinforcement Learning | P |
ngu |
Never Give Up: Learning Directed Exploration Strategies | P * |
agent57 |
Agent57: Outperforming the Atari Human Benchmark | P * |
Notes:
P
means support distributed training with multiple actors and a single learner running in parallel (only supports running on a single machine).*
means only tested on Atari Pong or Breakout.deep_rl_zoo
directory contains all the source code for different algorithms:
agent.py
module contains an agent class that includes reset()
, step()
methods,
for agent that supports distributed training, we have Actor
and Learner
classes for the specific agent.run_classic.py
module use simple MLP network to solve classic problems like CartPole, MountainCar, and LunarLander.run_atari.py
module use Conv2d neural network to solve Atari games.eval_agent.py
module evaluate trained agents by loading model state from checkpoint file with a greedy actor,
you can run testing on both classic problems like CartPole, MountainCar, LunarLander, and Atari games.main_loop.py
module contains functions run single thread and distributed training loops,
it also contains the run_env_loop
function where the agent interaction with the environment.networks
directory contains both policy networks and q networks used by the agents.
value.py
module contains neural networks for value-based RL agents like DQN, and it's variants.policy.py
module contains neural networks for policy-based RL agents like Actor-Critic, PPO, and it's variants.curiosity.py
module contains neural networks for curiosity driven explorations like RND modules used by PPO, NGU, and Agent57.trackers.py
module is used to accumulating statistics during training and testing/evaluation,
it also writes log to Tensorboard if desired.replay.py
module contains functions and classes relating to experience replay.value_learning.py
module contains functions to calculate losses for value-based RL agents like DQN, and it's variants.policy_gradient.py
module contains functions to calculate losses policy-based RL agents like Actor-Critic, PPO, and it's variants.gym_env.py
module contains components for standard Atari environment preprocessing.greedy_actors.py
module contains all the greedy actors for testing/evaluation.
for example EpsilonGreedyActor
for DQN agents, PolicyGreedyActor
for general policy gradient agents.unit_tests
directory contains the scripts for unit and end-to-end testing.runs
directory contains Tensorboard logs for some of the runs.screenshots
directory contains images of Tensorboard statistics for some of the runs.Please check the instructions in the QUICK_START.md
file on how to setup the project.
gym_env.py
module, by default it contains ['CartPole-v1', 'LunarLander-v2', 'MountainCar-v0', 'Acrobot-v1']
.To run a agent on classic control problem, use the following command, replace the
python3 -m deep_rl_zoo.<agent_name>.run_classic
# example of running DQN agents
python3 -m deep_rl_zoo.dqn.run_classic --environment_name=MountainCar-v0
python3 -m deep_rl_zoo.dqn.run_classic --environment_name=LunarLander-v2
NoFrameskip-v4
for Atari game, and we omit the need to include 'NoFrameskip' and version in the environment_name
args, as it will be handled by create_atari_environment
in the gym_env.py
module.To run a agent on Atari game, use the following command, replace the
python3 -m deep_rl_zoo.<agent_name>.run_atari
# example of running DQN on Atari Pong and Breakout
python3 -m deep_rl_zoo.dqn.run_atari --environment_name=Pong
python3 -m deep_rl_zoo.dqn.run_atari --environment_name=Breakout
For agents that support distributed training, we can adjust the parameter num_actors
to specify how many actors to run.
python3 -m deep_rl_zoo.ppo.run_classic --num_actors=8
The following is a high level overview of the distributed training architect. Where each actor has it's own copy of the neural network. And we use the multiprocessing.Queue to transfer the transitions between the actors and the leaner. We also use a shared dictionary to store the latest copy of the neural network's parameters, so the actors can get update it's local copy of the neural network later on.
By default, if you have multiple GPUs and you set the option actors_on_gpu
to true, the script will evenly distribute the actors on all available GPUs. When running multiple actors on GPU, watching out for possible CUDA OUT OF MEMORY error.
# This will evenly distribute the actors on all GPUs
python3 -m deep_rl_zoo.ppo.run_atari --num_actors=16 --actors_on_gpu
# This will run all actors on CPU even if you have multiple GPUs
python3 -m deep_rl_zoo.ppo.run_atari --num_actors=16 --noactors_on_gpu
Before you run the eval_agent module, make sure you have a valid checkpoint file for the specific agent and environment.
By default, it will record a video of agent self-play at the recordings
directory.
To run a agent on Atari game, use the following command, replace the
python3 -m deep_rl_zoo.<agent_name>.eval_agent
# Example of load pre-trained PPO model on Breakout
python3 -m deep_rl_zoo.ppo.eval_agent --environment_name=Breakout --load_checkpoint_file=./checkpoints/PPO_Breakout_0.ckpt
By default, both training, evaluation will log to Tensorboard at the runs
directory.
To disable this, use the option --nouse_tensorboard
.
tensorboard --logdir=./runs
The classes for write logs to Tensorboard is implemented in trackers.py
module.
run_parallel_training_iterations
in main_loop.py
moduleperformance(env_steps)
:
episode_return
the non-discounted sum of raw rewards of current episodeepisode_steps
the current episode length or stepsnum_episodes
how many episodes have been conductedstep_rate(second)
step per seconds, per actorsagent_statistics(env_steps)
:
statistics
property such as training loss, learning rate, discount, updates etc.learner_statistics(learner_steps)
:
statistics
property such as training loss, learning rate, discount, updates etc.This could be handy if we want to compare different hyper parameter's performances or different runs with various seeds
python3 -m deep_rl_zoo.impala.run_classic --use_lstm --learning_rate=0.00045 --tag=LSTM-LR0.00045
This could be handy if we want to see what's happening during the training, we can set the debug_screenshots_interval
(measured over number of episode) to some value, and it'll add screenshots of the terminal state to Tensorboard.
# Example of creating terminal state screenshot every 100 episodes
python3 -m deep_rl_zoo.ppo_rnd.run_atari --environment_name=MontezumaRevenge --debug_screenshots_interval=100
This project is based on the work of DeepMind, specifically the following projects:
In addition, other reference projects from the community have been very helpful to us, including:
This project is licensed under the Apache License, Version 2.0 (the "License") see the LICENSE file for details
If you reference or use our project in your research, please cite:
@software{deep_rl_zoo2022github,
title = {{Deep RL Zoo}: A collections of Deep RL algorithms implemented with PyTorch},
author = {Michael Hu},
url = {https://github.com/michaelnny/deep_rl_zoo},
version = {1.0.0},
year = {2022},
}