michaelnny / deep_rl_zoo

A collection of Deep Reinforcement Learning algorithms implemented with PyTorch to solve Atari games and classic control tasks like CartPole, LunarLander, and MountainCar.
Apache License 2.0
104 stars 11 forks source link
actor-critic agent57 c51 deep-reinforcement-learning dqn iqn never-give-up ppo qr-dqn r2d2 rainbow retrace rnd

Deep RL Zoo

A collection of Deep Reinforcement Learning algorithms implemented with PyTorch to solve Atari games and classic control tasks like CartPole, LunarLander, and MountainCar.

The overall project structure was based on DeepMind's DQN Zoo. We adapted the code to support PyTorch, in addition also implemented some SOTA algorithms like PPO, RND, R2D2, and Agent57.

Content

Environment and Requirements

Implemented Algorithms

Policy-based RL Algorithms

Directory Reference Paper Note
reinforce Policy Gradient Methods for RL *
reinforce_baseline Policy Gradient Methods for RL *
actor_critic Actor-Critic Algorithms *
a2c Asynchronous Methods for Deep Reinforcement Learning | synchronous, deterministic variant of A3C P
sac Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning | Soft Actor-Critic for Discrete Action Settings P *
ppo Proximal Policy Optimization Algorithms P
ppo_icm Curiosity-driven Exploration by Self-supervised Prediction P
ppo_rnd Exploration by Random Network Distillation P
impala IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures P

Value-based RL Algorithms

Directory Reference Paper Note
dqn Human Level Control Through Deep Reinforcement Learning
double_dqn Deep Reinforcement Learning with Double Q-learning
prioritized_dqn Prioritized Experience Replay
drqn Deep Recurrent Q-Learning for Partially Observable MDPs *
r2d2 Recurrent Experience Replay in Distributed Reinforcement Learning P
ngu Never Give Up: Learning Directed Exploration Strategies P *
agent57 Agent57: Outperforming the Atari Human Benchmark P *

Distributional Q Learning Algorithms

Directory Reference Paper Note
c51_dqn A Distributional Perspective on Reinforcement Learning
rainbow Rainbow: Combining Improvements in Deep Reinforcement Learning
qr_dqn Distributional Reinforcement Learning with Quantile Regression
iqn Implicit Quantile Networks for Distributional Reinforcement Learning

Notes:

Code Structure

Author's Notes

Quick Start

Please check the instructions in the QUICK_START.md file on how to setup the project.

Train Agents

Classic Control Tasks

To run a agent on classic control problem, use the following command, replace the with the sub-directory name.

python3 -m deep_rl_zoo.<agent_name>.run_classic

# example of running DQN agents
python3 -m deep_rl_zoo.dqn.run_classic --environment_name=MountainCar-v0

python3 -m deep_rl_zoo.dqn.run_classic --environment_name=LunarLander-v2

Atari games

To run a agent on Atari game, use the following command, replace the with the sub-directory name.

python3 -m deep_rl_zoo.<agent_name>.run_atari

# example of running DQN on Atari Pong and Breakout
python3 -m deep_rl_zoo.dqn.run_atari --environment_name=Pong

python3 -m deep_rl_zoo.dqn.run_atari --environment_name=Breakout

Distributed training with multiple actors and a single learner (on the same machine)

For agents that support distributed training, we can adjust the parameter num_actors to specify how many actors to run.

python3 -m deep_rl_zoo.ppo.run_classic --num_actors=8

The following is a high level overview of the distributed training architect. Where each actor has it's own copy of the neural network. And we use the multiprocessing.Queue to transfer the transitions between the actors and the leaner. We also use a shared dictionary to store the latest copy of the neural network's parameters, so the actors can get update it's local copy of the neural network later on.

parallel training architecture

By default, if you have multiple GPUs and you set the option actors_on_gpu to true, the script will evenly distribute the actors on all available GPUs. When running multiple actors on GPU, watching out for possible CUDA OUT OF MEMORY error.

# This will evenly distribute the actors on all GPUs
python3 -m deep_rl_zoo.ppo.run_atari --num_actors=16 --actors_on_gpu

# This will run all actors on CPU even if you have multiple GPUs
python3 -m deep_rl_zoo.ppo.run_atari --num_actors=16 --noactors_on_gpu

Evaluate Agents

Before you run the eval_agent module, make sure you have a valid checkpoint file for the specific agent and environment. By default, it will record a video of agent self-play at the recordings directory.

To run a agent on Atari game, use the following command, replace the with the sub-directory name.

python3 -m deep_rl_zoo.<agent_name>.eval_agent

# Example of load pre-trained PPO model on Breakout
python3 -m deep_rl_zoo.ppo.eval_agent --environment_name=Breakout --load_checkpoint_file=./checkpoints/PPO_Breakout_0.ckpt

Monitoring with Tensorboard

By default, both training, evaluation will log to Tensorboard at the runs directory. To disable this, use the option --nouse_tensorboard.

tensorboard --logdir=./runs

The classes for write logs to Tensorboard is implemented in trackers.py module.

Measurements available on Tensorboard

performance(env_steps):

agent_statistics(env_steps):

learner_statistics(learner_steps):

DQN on Pong

Add tags to Tensorboard

This could be handy if we want to compare different hyper parameter's performances or different runs with various seeds

python3 -m deep_rl_zoo.impala.run_classic --use_lstm --learning_rate=0.00045 --tag=LSTM-LR0.00045

Debug with environment screenshots

This could be handy if we want to see what's happening during the training, we can set the debug_screenshots_interval (measured over number of episode) to some value, and it'll add screenshots of the terminal state to Tensorboard.

# Example of creating terminal state screenshot every 100 episodes
python3 -m deep_rl_zoo.ppo_rnd.run_atari --environment_name=MontezumaRevenge --debug_screenshots_interval=100

PPO-RND on MontezumaRevenge

Acknowledgments

This project is based on the work of DeepMind, specifically the following projects:

In addition, other reference projects from the community have been very helpful to us, including:

License

This project is licensed under the Apache License, Version 2.0 (the "License") see the LICENSE file for details

Citing our work

If you reference or use our project in your research, please cite:

@software{deep_rl_zoo2022github,
  title = {{Deep RL Zoo}: A collections of Deep RL algorithms implemented with PyTorch},
  author = {Michael Hu},
  url = {https://github.com/michaelnny/deep_rl_zoo},
  version = {1.0.0},
  year = {2022},
}