unity-ml-reacher
This repository contains an implementation of deep reinforcement learning based on:
- Multi Agent Deep Deterministic Policy Gradients
- and Multi Agent Proximal Policy Optimization
The environment to be solved is having two agents playing tennis. Each agent is conducting a racket to bounce a ball over a net.
If an agent hits the ball over the net, it receives a reward of +0.1. If an agent lets a ball hit the ground or hits the ball out of bounds, it receives a reward of -0.01. Thus, the goal of each agent is to keep the ball in play.
This environment is similar to the tennis of Unity.
The action space is continuous [-1.0, +1.0] and consists of 2 values for horizontal and jumping moves.
The environment is considered as solved if the average score of one gent is >= 0.5 for 100 consecutive episodes.
A video of trained agents can be found here below
- MADDPG
- MAPPO
Content of this repository
- analysis.xlsx: results of several experiments
- report.pdf: a document that describes the implementation of the MADDPG and MAPPO, along with ideas for future work
- __run_tensorboard.bat__: to run tensorboard an visualize the loss during training
- folder agents: contains the implementation of
- a Multi Agent DDPG
- a Proximal Policy Optimization
- an Actor-Critic network model using tanh as activation
- A Gaussian based Actor-Critic network model using tanh as activation
- a ReplayBuffer
- Noise Generator
- an ActionNoise that disturb the output of the actor network to promote exploration
- a ParameterNoise that disturb the weight of the actor network to promote exploration
- an Ornstein-Uhlenbeck noise generator
- a simple noise generator based on numpy random generator
- folder started_to_converge: weights of a network that started to converge but slowly
- folder __final_weights__:
- __final_maddpg_local_2.pth__ weights of a local network trained with MADDPG that solved this environment.
- __final_maddpg_target_2.pth__ weights of a target network trained with MADDPG that solved this environment.
- final_maddpg_local.pth weights of a local network trained with MADDPG that reached 0.5 during the training but is not stable during visual validation.
- final_maddpg_target.pth weights of a target network trained with MADDPG that reached 0.5 during the training but is not stable during visual validation.
- __final_ppo.pth__ weights of the Gaussian Actor Critic Network that solved this environment with Multi Agent PPO
- __final_maddpg.png__ chart of the 1st phase of training using MADDPG
- final_maddpg_2.png chart of the 2st phase of training using MADDPG
- __final_ppo.png__ chart of the result of the training using MAPPO
- Jupyter Notebooks
- Multi Agent Deep Deterministic Policy Gradient.ipynb: run this notebook to train the agents using MADDPG and to view its performance
- Multi Agent Proximal Policy Optimization.ipynb: run this notebook to train the agents using MAPPO and to view its performance
Requirements
To run the codes, follow the next steps:
- Create a new environment:
- Perform a minimal install of OpenAI gym
- Install Tensorflow and Tensorboard
pip install tensorflow, tensorflow-gpu
or
pip install tensorflow
- Install PyTorch
pip install pytorch
- Install the dependencies under the folder python/
cd python
pip install .
- Install jupyter notebook
pip install jupyter notebook
- Fix an issue of pytorch 0.4.1 to allow backpropagate the torch.distribution.normal function up to its standard deviation parameter
- Create an IPython kernel for the
ddpg
environment
pip install ipykernel
python -m ipykernel install --user --name ddpg --display-name "ddpg"
- If cannot start any notebook, run the following command to reinstall nbconvert
pip3 install --upgrade --user nbconvert
- Download the Unity Environment (thanks to Udacity) which matches your operating system
- Start jupyter notebook from the root of this python codes
jupyter notebook
- Once started, change the kernel through the menu
Kernel
>Change kernel
>ddpg
- If necessary, inside the ipynb files, change the path to the unity environment appropriately